The generalization performance of deep neural networks with regard to the
optimization algorithm is one of the major concerns in machine learning. This
performance can be affected by various factors. In this paper, we theoretically
prove that the Lipschitz constant of a loss function is an important factor to
diminish the generalization error of the output model obtained by Adam or
AdamW. The results can be used as a guideline for choosing the loss function
when the optimization algorithm is Adam or AdamW. In addition, to evaluate the
theoretical bound in a practical setting, we choose the human age estimation
problem in computer vision. For assessing the generalization better, the
training and test datasets are drawn from different distributions. Our
experimental evaluation shows that the loss function with a lower Lipschitz
constant and maximum value improves the generalization of the model trained by
Adam or AdamW.
( 2
min )
Despite the dominance and effectiveness of scaling, resulting in large
networks with hundreds of billions of parameters, the necessity to train
overparametrized models remains poorly understood, and alternative approaches
do not necessarily make it cheaper to train high-performance models. In this
paper, we explore low-rank training techniques as an alternative approach to
training large neural networks. We introduce a novel method called ReLoRA,
which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to
pre-training transformer language models with up to 350M parameters and
demonstrate comparable performance to regular neural network training.
Furthermore, we observe that the efficiency of ReLoRA increases with model
size, making it a promising approach for training multi-billion-parameter
networks efficiently. Our findings shed light on the potential of low-rank
training techniques and their implications for scaling laws.
( 2
min )
We present APAC-Net, an alternating population and agent control neural
network for solving stochastic mean field games (MFGs). Our algorithm is geared
toward high-dimensional instances of MFGs that are beyond reach with existing
solution methods. We achieve this in two steps. First, we take advantage of the
underlying variational primal-dual structure that MFGs exhibit and phrase it as
a convex-concave saddle point problem. Second, we parameterize the value and
density functions by two neural networks, respectively. By phrasing the problem
in this manner, solving the MFG can be interpreted as a special case of
training a generative adversarial network (GAN). We show the potential of our
method on up to 100-dimensional MFG problems.
( 2
min )
Federated learning (FL) has evolved as a prominent method for edge devices to
cooperatively create a unified prediction model while securing their sensitive
training data local to the device. Despite the existence of numerous research
frameworks for simulating FL algorithms, they do not facilitate comprehensive
deployment for automatic speech recognition tasks on heterogeneous edge
devices. This is where Ed-Fed, a comprehensive and generic FL framework, comes
in as a foundation for future practical FL system research. We also propose a
novel resource-aware client selection algorithm to optimise the waiting time in
the FL settings. We show that our approach can handle the straggler devices and
dynamically set the training time for the selected devices in a round. Our
evaluation has shown that the proposed approach significantly optimises waiting
time in FL compared to conventional random client selection methods.
( 2
min )
The current cut selection algorithm used in mixed-integer programming solvers
has remained largely unchanged since its creation. In this paper, we propose a
set of new cut scoring measures, cut filtering techniques, and stopping
criteria, extending the current state-of-the-art algorithm and obtaining a 4\%
performance improvement for SCIP over the MIPLIB 2017 benchmark set.
( 2
min )
Controlling nonlinear dynamical systems using machine learning allows to not
only drive systems into simple behavior like periodicity but also to more
complex arbitrary dynamics. For this, it is crucial that a machine learning
system can be trained to reproduce the target dynamics sufficiently well. On
the example of forcing a chaotic parametrization of the Lorenz system into
intermittent dynamics, we show first that classical reservoir computing excels
at this task. In a next step, we compare those results based on different
amounts of training data to an alternative setup, where next-generation
reservoir computing is used instead. It turns out that while delivering
comparable performance for usual amounts of training data, next-generation RC
significantly outperforms in situations where only very limited data is
available. This opens even further practical control applications in real world
problems where data is restricted.
( 2
min )
Recent advances in large language models have led to renewed interest in
natural language processing in healthcare using the free text of clinical
notes. One distinguishing characteristic of clinical notes is their long time
span over multiple long documents. The unique structure of clinical notes
creates a new design choice: when the context length for a language model
predictor is limited, which part of clinical notes should we choose as the
input? Existing studies either choose the inputs with domain knowledge or
simply truncate them. We propose a framework to analyze the sections with high
predictive power. Using MIMIC-III, we show that: 1) predictive power
distribution is different between nursing notes and discharge notes and 2)
combining different types of notes could improve performance when the context
length is large. Our findings suggest that a carefully selected sampling
function could enable more efficient information extraction from clinical
notes.
( 2
min )
We propose a novel task-agnostic in-domain pre-training method that sits
between generic pre-training and fine-tuning. Our approach selectively masks
in-domain keywords, i.e., words that provide a compact representation of the
target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We
evaluate our approach using six different settings: three datasets combined
with two distinct pre-trained language models (PLMs). Our results reveal that
the fine-tuned PLMs adapted using our in-domain pre-training strategy
outperform PLMs that used in-domain pre-training with random masking as well as
those that followed the common pre-train-then-fine-tune paradigm. Further, the
overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the
pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
( 2
min )
Understanding how the statistical and geometric properties of neural activity
relate to performance is a key problem in theoretical neuroscience and deep
learning. Here, we calculate how correlations between object representations
affect the capacity, a measure of linear separability. We show that for
spherical object manifolds, introducing correlations between centroids
effectively pushes the spheres closer together, while introducing correlations
between the axes effectively shrinks their radii, revealing a duality between
correlations and geometry with respect to the problem of classification. We
then apply our results to accurately estimate the capacity of deep network
data.
( 2
min )
We analyze statistical discrimination in hiring markets using a multi-armed
bandit model. Myopic firms face workers arriving with heterogeneous observable
characteristics. The association between the worker's skill and characteristics
is unknown ex ante; thus, firms need to learn it. Laissez-faire causes
perpetual underestimation: minority workers are rarely hired, and therefore,
the underestimation tends to persist. Even a marginal imbalance in the
population ratio frequently results in perpetual underestimation. We propose
two policy solutions: a novel subsidy rule (the hybrid mechanism) and the
Rooney Rule. Our results indicate that temporary affirmative actions
effectively alleviate discrimination stemming from insufficient data.
( 2
min )
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing
for a previous task leverage, named the source, into a new one, the target,
without requiring access to the source data. Indeed, HTL relies only on a
hypothesis learnt from such source data, relieving the hurdle of expansive data
storage and providing great practical benefits. Hence, HTL is highly beneficial
for real-world applications relying on big data. The analysis of such a method
from a theoretical perspective faces multiple challenges, particularly in
classification tasks. This paper deals with this problem by studying the
learning theory of HTL through algorithmic stability, an attractive theoretical
framework for machine learning algorithms analysis. In particular, we are
interested in the statistical behaviour of the regularized empirical risk
minimizers in the case of binary classification. Our stability analysis
provides learning guarantees under mild assumptions. Consequently, we derive
several complexity-free generalization bounds for essential statistical
quantities like the training error, the excess risk and cross-validation
estimates. These refined bounds allow understanding the benefits of transfer
learning and comparing the behaviour of standard losses in different scenarios,
leading to valuable insights for practitioners.
( 2
min )
Understanding the implicit regularization imposed by neural network
architectures and gradient based optimization methods is a key challenge in
deep learning and AI. In this work we provide sharp results for the implicit
regularization imposed by the gradient flow of Diagonal Linear Networks (DLNs)
in the over-parameterized regression setting and, potentially surprisingly,
link this to the phenomenon of phase transitions in generalized hardness of
approximation (GHA). GHA generalizes the phenomenon of hardness of
approximation from computer science to, among others, continuous and robust
optimization. It is well-known that the $\ell^1$-norm of the gradient flow of
DLNs with tiny initialization converges to the objective function of basis
pursuit. We improve upon these results by showing that the gradient flow of
DLNs with tiny initialization approximates minimizers of the basis pursuit
optimization problem (as opposed to just the objective function), and we obtain
new and sharp convergence bounds w.r.t.\ the initialization size. Non-sharpness
of our results would imply that the GHA phenomenon would not occur for the
basis pursuit optimization problem -- which is a contradiction -- thus implying
sharpness. Moreover, we characterize $\textit{which}$ $\ell_1$ minimizer of the
basis pursuit problem is chosen by the gradient flow whenever the minimizer is
not unique. Interestingly, this depends on the depth of the DLN.
( 3
min )
We present APAC-Net, an alternating population and agent control neural
network for solving stochastic mean field games (MFGs). Our algorithm is geared
toward high-dimensional instances of MFGs that are beyond reach with existing
solution methods. We achieve this in two steps. First, we take advantage of the
underlying variational primal-dual structure that MFGs exhibit and phrase it as
a convex-concave saddle point problem. Second, we parameterize the value and
density functions by two neural networks, respectively. By phrasing the problem
in this manner, solving the MFG can be interpreted as a special case of
training a generative adversarial network (GAN). We show the potential of our
method on up to 100-dimensional MFG problems.
( 2
min )
In this paper, we investigate the impact of numerical instability on the
reliability of sampling, density evaluation, and evidence lower bound (ELBO)
estimation in variational flows. We first empirically demonstrate that common
flows can exhibit a catastrophic accumulation of error: the numerical flow map
deviates significantly from the exact map -- which affects sampling -- and the
numerical inverse flow map does not accurately recover the initial input --
which affects density and ELBO computations. Surprisingly though, we find that
results produced by flows are often accurate enough for applications despite
the presence of serious numerical instability. In this work, we treat
variational flows as dynamical systems, and leverage shadowing theory to
elucidate this behavior via theoretical guarantees on the error of sampling,
density evaluation, and ELBO estimation. Finally, we develop and empirically
test a diagnostic procedure that can be used to validate results produced by
numerically unstable flows in practice.
( 2
min )
Large-language models (LLMs) such as GPT-4 caught the interest of many
scientists. Recent studies suggested that these models could be useful in
chemistry and materials science. To explore these possibilities, we organized a
hackathon.
This article chronicles the projects built as part of this hackathon.
Participants employed LLMs for various applications, including predicting
properties of molecules and materials, designing novel interfaces for tools,
extracting knowledge from unstructured data, and developing new educational
applications.
The diverse topics and the fact that working prototypes could be generated in
less than two days highlight that LLMs will profoundly impact the future of our
fields. The rich collection of ideas and projects also indicates that the
applications of LLMs are not limited to materials science and chemistry but
offer potential benefits to a wide range of scientific disciplines.
( 3
min )
PIGINet leverages machine learning to streamline and enhance household robots' task and motion planning, by assessing and filtering feasible solutions in complex environments.
( 9
min )
A new report by MIT researchers highlights the potential of generative AI to help workers with certain writing assignments.
( 9
min )
We study the adaption of soft actor-critic (SAC) from continuous action space
to discrete action space. We revisit vanilla SAC and provide an in-depth
understanding of its Q value underestimation and performance instability issues
when applied to discrete settings. We thereby propose entropy-penalty and
double average Q-learning with Q-clip to address these issues. Extensive
experiments on typical benchmarks with discrete action space, including Atari
games and a large-scale MOBA game, show the efficacy of our proposed method.
Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC.
( 2
min )
The coupling of deep reinforcement learning to numerical flow control
problems has recently received a considerable attention, leading to
groundbreaking results and opening new perspectives for the domain. Due to the
usually high computational cost of fluid dynamics solvers, the use of parallel
environments during the learning process represents an essential ingredient to
attain efficient control in a reasonable time. Yet, most of the deep
reinforcement learning literature for flow control relies on on-policy
algorithms, for which the massively parallel transition collection may break
theoretical assumptions and lead to suboptimal control models. To overcome this
issue, we propose a parallelism pattern relying on partial-trajectory buffers
terminated by a return bootstrapping step, allowing a flexible use of parallel
environments while preserving the on-policiness of the updates. This approach
is illustrated on a CPU-intensive continuous flow control problem from the
literature.
( 2
min )
When measuring rare processes at Belle II, a huge luminosity is required,
which means a large number of simulations are necessary to determine signal
efficiencies and background contributions. However, this process demands high
computation costs while most of the simulated data, in particular in case of
background, are discarded by the event selection. Thus, filters using graph
neural networks are introduced at an early stage to save the resources for the
detector simulation and reconstruction of events discarded at analysis level.
In our work, we improved the performance of the filters using graph attention
and investigated statistical methods including sampling and reweighting to deal
with the biases introduced by the filtering.
( 2
min )
For prediction of clustered time-to-event data, we propose a new deep neural
network based gamma frailty model (DNN-FM). An advantage of the proposed model
is that the joint maximization of the new h-likelihood provides maximum
likelihood estimators for fixed parameters and best unbiased predictors for
random frailties. Thus, the proposed DNN-FM is trained by using a negative
profiled h-likelihood as a loss function, constructed by profiling out the
non-parametric baseline hazard. Experimental studies show that the proposed
method enhances the prediction performance of the existing methods. A real data
analysis shows that the inclusion of subject-specific frailties helps to
improve prediction of the DNN based Cox model (DNN-Cox).
( 2
min )
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient
training algorithms designed to improve training, validation, and downstream
performance faster than standard training. In this work, we revisit three
categories of such algorithms: dynamic architectures (layer stacking, layer
dropping), batch selection (selective backprop, RHO loss), and efficient
optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed
computation budget using such methods, we find that their training, validation,
and downstream gains vanish compared to a baseline with a fully-decayed
learning rate. We define an evaluation protocol that enables computation to be
done on arbitrary machines by mapping all computation time to a reference
machine which we call reference system time. We discuss the limitations of our
proposed protocol and release our code to encourage rigorous research in
efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.
( 2
min )
Recently Chen and Poor initiated the study of learning mixtures of linear
dynamical systems. While linear dynamical systems already have wide-ranging
applications in modeling time-series data, using mixture models can lead to a
better fit or even a richer understanding of underlying subpopulations
represented in the data. In this work we give a new approach to learning
mixtures of linear dynamical systems that is based on tensor decompositions. As
a result, our algorithm succeeds without strong separation conditions on the
components, and can be used to compete with the Bayes optimal clustering of the
trajectories. Moreover our algorithm works in the challenging
partially-observed setting. Our starting point is the simple but powerful
observation that the classic Ho-Kalman algorithm is a close relative of modern
tensor decomposition methods for learning latent variable models. This gives us
a playbook for how to extend it to work with more complicated generative
models.
( 2
min )
For prediction of clustered time-to-event data, we propose a new deep neural
network based gamma frailty model (DNN-FM). An advantage of the proposed model
is that the joint maximization of the new h-likelihood provides maximum
likelihood estimators for fixed parameters and best unbiased predictors for
random frailties. Thus, the proposed DNN-FM is trained by using a negative
profiled h-likelihood as a loss function, constructed by profiling out the
non-parametric baseline hazard. Experimental studies show that the proposed
method enhances the prediction performance of the existing methods. A real data
analysis shows that the inclusion of subject-specific frailties helps to
improve prediction of the DNN based Cox model (DNN-Cox).
( 2
min )
We investigate a framework for binary image denoising via restricted
Boltzmann machines (RBMs) that introduces a denoising objective in quadratic
unconstrained binary optimization (QUBO) form and is well-suited for quantum
annealing. The denoising objective is attained by balancing the distribution
learned by a trained RBM with a penalty term for derivations from the noisy
image. We derive the statistically optimal choice of the penalty parameter
assuming the target distribution has been well-approximated, and further
suggest an empirically supported modification to make the method robust to that
idealistic assumption. We also show under additional assumptions that the
denoised images attained by our method are, in expectation, strictly closer to
the noise-free images than the noisy images are. While we frame the model as an
image denoising model, it can be applied to any binary data. As the QUBO
formulation is well-suited for implementation on quantum annealers, we test the
model on a D-Wave Advantage machine, and also test on data too large for
current quantum annealers by approximating QUBO solutions through classical
heuristics.
( 2
min )
We consider stochastic optimization problems where data is drawn from a
Markov chain. Existing methods for this setting crucially rely on knowing the
mixing time of the chain, which in real-world applications is usually unknown.
We propose the first optimization method that does not require the knowledge of
the mixing time, yet obtains the optimal asymptotic convergence rate when
applied to convex problems. We further show that our approach can be extended
to: (i) finding stationary points in non-convex optimization with Markovian
data, and (ii) obtaining better dependence on the mixing time in temporal
difference (TD) learning; in both cases, our method is completely oblivious to
the mixing time. Our method relies on a novel combination of multi-level Monte
Carlo (MLMC) gradient estimation together with an adaptive learning method.
( 2
min )
We propose a goodness-of-fit measure for probability densities modeling
observations with varying dimensionality, such as text documents of differing
lengths or variable-length sequences. The proposed measure is an instance of
the kernel Stein discrepancy (KSD), which has been used to construct
goodness-of-fit tests for unnormalized densities. The KSD is defined by its
Stein operator: current operators used in testing apply to fixed-dimensional
spaces. As our main contribution, we extend the KSD to the variable-dimension
setting by identifying appropriate Stein operators, and propose a novel KSD
goodness-of-fit test. As with the previous variants, the proposed KSD does not
require the density to be normalized, allowing the evaluation of a large class
of models. Our test is shown to perform well in practice on discrete sequential
data benchmarks.
( 2
min )
Recently Chen and Poor initiated the study of learning mixtures of linear
dynamical systems. While linear dynamical systems already have wide-ranging
applications in modeling time-series data, using mixture models can lead to a
better fit or even a richer understanding of underlying subpopulations
represented in the data. In this work we give a new approach to learning
mixtures of linear dynamical systems that is based on tensor decompositions. As
a result, our algorithm succeeds without strong separation conditions on the
components, and can be used to compete with the Bayes optimal clustering of the
trajectories. Moreover our algorithm works in the challenging
partially-observed setting. Our starting point is the simple but powerful
observation that the classic Ho-Kalman algorithm is a close relative of modern
tensor decomposition methods for learning latent variable models. This gives us
a playbook for how to extend it to work with more complicated generative
models.
( 2
min )
Stochastic Gradient Descent (SGD) is one of the simplest and most popular
algorithms in modern statistical and machine learning due to its computational
and memory efficiency. Various averaging schemes have been proposed to
accelerate the convergence of SGD in different settings. In this paper, we
explore a general averaging scheme for SGD. Specifically, we establish the
asymptotic normality of a broad range of weighted averaged SGD solutions and
provide asymptotically valid online inference approaches. Furthermore, we
propose an adaptive averaging scheme that exhibits both optimal statistical
rate and favorable non-asymptotic convergence, drawing insights from the
optimal weight for the linear model in terms of non-asymptotic mean squared
error (MSE).
( 2
min )
Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that […]
( 11
min )
A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet. On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers Read article >
( 11
min )
Arise, members! Capcom’s legendary role-playing game Dragon’s Dogma: Dark Arisen joins the GeForce NOW library today. The RPG and THQ Nordic’s Jagged Alliance 3 are newly supported on GeForce NOW, playable on nearly any device. From Dusk Till Pawn Become the Arisen and take up the challenge in Capcom’s critically acclaimed RPG. Set in a Read article >
( 5
min )
Because headphones rank among the most popular wearables in the market, we have an exciting opportunity to expand their capabilities through integrating existing sensors with supplementary ones to enable a wide variety of experiences that go beyond traditional audio control.
The post Thinking beyond audio: Augmenting headphones for everyday digital interactions appeared first on Microsoft Research.
( 12
min )
A crack NVIDIA team of five machine learning experts spread across four continents won all three tasks in a hotly contested, prestigious competition to build state-of-the-art recommendation systems. The results reflect the group’s savvy applying the NVIDIA AI platform to real-world challenges for these engines of the digital economy. Recommenders serve up trillions of search Read article >
( 6
min )
Startup MosaicML is on a mission to help the AI community improve prediction accuracy, decrease costs and save time by providing tools for easy training and deployment of large AI models. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with MosaicML CEO and co-founder Naveen Rao about how the company aims to Read article >
( 5
min )
Differences in gait patterns of children with Duchenne muscular dystrophy
(DMD) and typically-developing (TD) peers are visible to the eye, but
quantifications of those differences outside of the gait laboratory have been
elusive. In this work, we measured vertical, mediolateral, and anteroposterior
acceleration using a waist-worn iPhone accelerometer during ambulation across a
typical range of velocities. Fifteen TD and fifteen DMD children from 3-16
years of age underwent eight walking/running activities, including five 25
meters walk/run speed-calibration tests at a slow walk to running speeds (SC-L1
to SC-L5), a 6-minute walk test (6MWT), a 100 meters fast-walk/jog/run
(100MRW), and a free walk (FW). For clinical anchoring purposes, participants
completed a Northstar Ambulatory Assessment (NSAA). We extracted temporospatial
gait clinical features (CFs) and applied multiple machine learning (ML)
approaches to differentiate between DMD and TD children using extracted
temporospatial gait CFs and raw data. Extracted temporospatial gait CFs showed
reduced step length and a greater mediolateral component of total power (TP)
consistent with shorter strides and Trendelenberg-like gait commonly observed
in DMD. ML approaches using temporospatial gait CFs and raw data varied in
effectiveness at differentiating between DMD and TD controls at different
speeds, with an accuracy of up to 100%. We demonstrate that by using ML with
accelerometer data from a consumer-grade smartphone, we can capture
DMD-associated gait characteristics in toddlers to teens.
( 3
min )
Recent groundbreaking developments on generative modeling have sparked
interest in practical single-model attribution. Such methods predict whether a
sample was generated by a specific generator or not, for instance, to prove
intellectual property theft. However, previous works are either limited to the
closed-world setting or require undesirable changes of the generative model. We
address these shortcomings by proposing FLIPAD, a new approach for single-model
attribution in the open-world setting based on final-layer inversion and
anomaly detection. We show that the utilized final-layer inversion can be
reduced to a convex lasso optimization problem, making our approach
theoretically sound and computationally efficient. The theoretical findings are
accompanied by an experimental study demonstrating the effectiveness of our
approach, outperforming the existing methods.
( 2
min )
We propose a simple and efficient approach to generate a prediction intervals
(PI) for approximated and forecasted trends. Our method leverages a weighted
asymmetric loss function to estimate the lower and upper bounds of the PI, with
the weights determined by its coverage probability. We provide a concise
mathematical proof of the method, show how it can be extended to derive PIs for
parametrised functions and argue why the method works for predicting PIs of
dependent variables. The presented tests of the method on a real-world
forecasting task using a neural network-based model show that it can produce
reliable PIs in complex machine learning scenarios.
( 2
min )
The importance of high data quality is increasing with the growing impact and
distribution of ML systems and big data. Also the planned AI Act from the
European commission defines challenging legal requirements for data quality
especially for the market introduction of safety relevant ML systems. In this
paper we introduce a novel approach that supports the data quality assurance
process of multiple data quality aspects. This approach enables the
verification of quantitative data quality requirements. The concept and
benefits are introduced and explained on small example data sets. How the
method is applied is demonstrated on the well known MNIST data set based an
handwritten digits.
( 2
min )
Deep generative chemistry models emerge as powerful tools to expedite drug
discovery. However, the immense size and complexity of the structural space of
all possible drug-like molecules pose significant obstacles, which could be
overcome with hybrid architectures combining quantum computers with deep
classical networks. As the first step toward this goal, we built a compact
discrete variational autoencoder (DVAE) with a Restricted Boltzmann Machine
(RBM) of reduced size in its latent layer. The size of the proposed model was
small enough to fit on a state-of-the-art D-Wave quantum annealer and allowed
training on a subset of the ChEMBL dataset of biologically active compounds.
Finally, we generated 2331 novel chemical structures with medicinal chemistry
and synthetic accessibility properties in the ranges typical for molecules from
ChEMBL. The presented results demonstrate the feasibility of using already
existing or soon-to-be-available quantum computing devices as testbeds for
future drug discovery applications.
( 2
min )
Partial differential equations (PDEs) are a model candidate for soft sensors
in industrial processes with spatiotemporal dependence. Although
physics-informed neural networks (PINNs) are a promising machine learning
method for solving PDEs, they are infeasible for the nonhomogeneous PDEs with
unmeasurable source terms. To this end, a coupled PINN (CPINN) with a recurrent
prediction (RP) learning strategy (CPINN- RP) is proposed. First, CPINN
composed of NetU and NetG is proposed. NetU is for approximating PDEs solutions
and NetG is for regularizing the training of NetU. The two networks are
integrated into a data-physics-hybrid loss function. Then, we theoretically
prove that the proposed CPINN has a satisfying approximation capability for
solutions to nonhomogeneous PDEs with unmeasurable source terms. Besides the
theoretical aspects, we propose a hierarchical training strategy to optimize
and couple NetU and NetG. Secondly, NetU-RP is proposed for compensating
information loss in data sampling to improve the prediction performance, in
which RP is the recurrently delayed outputs of well-trained CPINN and hard
sensors. Finally, the artificial and practical datasets are used to verify the
feasibility and effectiveness of CPINN-RP for soft sensors.
( 3
min )
This paper introduces Local Learner (2L), an algorithm for providing a set of
reference strategies to guide the search for programmatic strategies in
two-player zero-sum games. Previous learning algorithms, such as Iterated Best
Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be
computationally expensive or miss important information for guiding search
algorithms. 2L actively selects a set of reference strategies to improve the
search signal. We empirically demonstrate the advantages of our approach while
guiding a local search algorithm for synthesizing strategies in three games,
including MicroRTS, a challenging real-time strategy game. Results show that 2L
learns reference strategies that provide a stronger search signal than IBR, FP,
and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L
outperformed the winners of the two latest MicroRTS competitions, which were
programmatic strategies written by human programmers.
( 2
min )
We propose a new online algorithm for cumulative regret minimization in a
stochastic linear bandit. The algorithm pulls the arm with the highest
estimated reward in a linear model trained on its perturbed history. Therefore,
we call it perturbed-history exploration in a linear bandit (LinPHE). The
perturbed history is a mixture of observed rewards and randomly generated
i.i.d. pseudo-rewards. We derive a $\tilde{O}(d \sqrt{n})$ gap-free bound on
the $n$-round regret of LinPHE, where $d$ is the number of features. The key
steps in our analysis are new concentration and anti-concentration bounds on
the weighted sum of Bernoulli random variables. To show the generality of our
design, we generalize LinPHE to a logistic model. We evaluate our algorithms
empirically and show that they are practical.
( 2
min )
Recent work has applied supervised deep learning to derive continuous
symmetry transformations that preserve the data labels and to obtain the
corresponding algebras of symmetry generators. This letter introduces two
improved algorithms that significantly speed up the discovery of these symmetry
transformations. The new methods are demonstrated by deriving the complete set
of generators for the unitary groups U(n) and the exceptional Lie groups $G_2$,
$F_4$, and $E_6$. A third post-processing algorithm renders the found
generators in sparse form. We benchmark the performance improvement of the new
algorithms relative to the standard approach. Given the significant complexity
of the exceptional Lie groups, our results demonstrate that this
machine-learning method for discovering symmetries is completely general and
can be applied to a wide variety of labeled datasets.
( 2
min )
A novel human emotion recognition method based on automatically selected
Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper.
GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is
de-noised by wavelet function and normalized to get rid of the individual
difference. 30 features are extracted from the normalized data, however,
directly using of these features will lead to a low recognition rate. In order
to gain the optimized features, a covariance based feature selection is
employed in our method. Finally, a SVM with input of the optimized features is
utilized to achieve the human emotion recognition. The experimental results
indicate that the proposed method leads to good human emotion recognition, and
the recognition accuracy is more than 66.67%.
( 2
min )
Score based generative models are a new class of generative models that have
been shown to accurately generate high dimensional calorimeter datasets. Recent
advances in generative models have used images with 3D voxels to represent and
model complex calorimeter showers. Point clouds, however, are likely a more
natural representation of calorimeter showers, particularly in calorimeters
with high granularity. Point clouds preserve all of the information of the
original simulation, more naturally deal with sparse datasets, and can be
implemented with more compact models and data files. In this work, two
state-of-the-art score based models are trained on the same set of calorimeter
simulation and directly compared.
( 2
min )
Business statistics play a crucial role in implementing a data-driven
strategic plan at the enterprise level to employ various analytics where the
outcomes of such a plan enable an enterprise to enhance the decision-making
process or to mitigate risks to the organization. In this work, a strategic
plan informed by the statistical analysis is introduced for a financial company
called LendingClub, where the plan is comprised of exploring the possibility of
onboarding a big data platform along with advanced feature selection
capacities. The main objectives of such a plan are to increase the company's
revenue while reducing the risks of granting loans to borrowers who cannot
return their loans. In this study, different hypotheses formulated to address
the company's concerns are studied, where the results reveal that the amount of
loans profoundly impacts the number of borrowers charging off their loans.
Also, the proposed strategic plan includes onboarding advanced analytics such
as machine learning technologies that allow the company to build better
generalized data-driven predictive models.
( 2
min )
As more and more customers are looking to put machine learning (ML) workloads in production, there is a large push in organizations to shorten the development lifecycle of ML code. Many organizations prefer writing their ML code in a production-ready style in the form of Python methods and classes as opposed to an exploratory style […]
( 8
min )
Efficiency is vital in the face of escalating demand for cloud resources. And efficient power management strategies address the bottleneck of power availability in datacenters. Learn how we optimize power allocation to support sustainable resource usage.
The post Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures appeared first on Microsoft Research.
( 11
min )
Jacob Norris is a 3D artist and the president, co-founder and creative director of Sierra Division Studios — an outsource studio specializing in digital 3D content creation.
( 9
min )
Luca Carlone and Jonathan How of MIT LIDS discuss how future robots might perceive and interact with their environment.
( 8
min )
NVIDIA RTX is spinning new cycles for designs. Trek Bicycle is using GPUs to bring design concepts to life. The Wisconsin-based company, one of the largest bicycle manufacturers in the world, aims to create bikes with the highest-quality craftsmanship. With its new partner Lidl, an international retailer chain, Trek Bicycle also owns a cycling team, Read article >
( 7
min )
This research paper was accepted by 2023 USENIX Annual Technical Conference (ATC), which is dedicated to advancing the field of systems research. Whether they’re personal computers or cloud instances, it’s crucial to ensure that the computer systems people use every day are reliable and secure. The validity of these systems is critical because if storage […]
The post Renovating computer systems securely and progressively with APRON appeared first on Microsoft Research.
( 10
min )
When an enterprise project is low-profile (“below the radar”), then it is not likely to be the target of bad actors. Similarly, if some part of that project’s infrastructure fails or falters, then the consequences of the problem and/or the urgency of providing a solution are usually manageable. But when a high-profile (“above the radar”)… Read More »AIOps above the radar – Using AI to monitor your AI infrastructure
The post AIOps above the radar – Using AI to monitor your AI infrastructure appeared first on Data Science Central.
( 22
min )
There is a recent article, Unraveling the Mystery of Human Consciousness, where it was stated that, “Consciousness makes us capable of experiencing the scent of a rose, the touch of a breeze, the taste of food, the sound of music, and the sight of a sunrise. We also have a unique ability to be aware… Read More »Sentience: AI has demystified human consciousness, intelligence
The post Sentience: AI has demystified human consciousness, intelligence appeared first on Data Science Central.
( 19
min )
Amazon Kendra and LLamaIndex can help with knowledge integration but fall short in connecting diverse knowledge sources, to enable efficient intelligent search. In this article, we compare the existing solutions and explain how to overcome their limitations using a Google Drive crawler. Companies often face difficulties in consolidating their knowledge base when their data is… Read More »Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers
The post Exploring intelligent search solutions: A comparative analysis of Amazon Kendra integration and large language model crawlers appeared first on Data Science Central.
( 27
min )
Identifying and suppressing unknown disturbances to dynamical systems is a
problem with applications in many different fields. In this Letter, we present
a model-free method to identify and suppress an unknown disturbance to an
unknown system based only on previous observations of the system under the
influence of a known forcing function. We find that, under very mild
restrictions on the training function, our method is able to robustly identify
and suppress a large class of unknown disturbances. We illustrate our scheme
with an example where a chaotic disturbance to the Lorenz system is identified
and suppressed.
( 2
min )
We discuss a vulnerability involving a category of attribution methods used
to provide explanations for the outputs of convolutional neural networks
working as classifiers. It is known that this type of networks are vulnerable
to adversarial attacks, in which imperceptible perturbations of the input may
alter the outputs of the model. In contrast, here we focus on effects that
small modifications in the model may cause on the attribution method without
altering the model outputs.
( 2
min )
We consider the problem of clustering privately a dataset in $\mathbb{R}^d$
that undergoes both insertion and deletion of points. Specifically, we give an
$\varepsilon$-differentially private clustering mechanism for the $k$-means
objective under continual observation. This is the first approximation
algorithm for that problem with an additive error that depends only
logarithmically in the number $T$ of updates. The multiplicative error is
almost the same as non privately. To do so we show how to perform dimension
reduction under continual observation and combine it with a differentially
private greedy approximation algorithm for $k$-means. We also partially extend
our results to the $k$-median problem.
( 2
min )
This research proposes a machine learning-based attack detection model for
power systems, specifically targeting smart grids. By utilizing data and logs
collected from Phasor Measuring Devices (PMUs), the model aims to learn system
behaviors and effectively identify potential security boundaries. The proposed
approach involves crucial stages including dataset pre-processing, feature
selection, model creation, and evaluation. To validate our approach, we used a
dataset used, consist of 15 separate datasets obtained from different PMUs,
relay snort alarms and logs. Three machine learning models: Random Forest,
Logistic Regression, and K-Nearest Neighbour were built and evaluated using
various performance metrics. The findings indicate that the Random Forest model
achieves the highest performance with an accuracy of 90.56% in detecting power
system disturbances and has the potential in assisting operators in
decision-making processes.
( 2
min )
Feature-distributed data, referred to data partitioned by features and stored
across multiple computing nodes, are increasingly common in applications with a
large number of features. This paper proposes a two-stage relaxed greedy
algorithm (TSRGA) for applying multivariate linear regression to such data. The
main advantage of TSRGA is that its communication complexity does not depend on
the feature dimension, making it highly scalable to very large data sets. In
addition, for multivariate response variables, TSRGA can be used to yield
low-rank coefficient estimates. The fast convergence of TSRGA is validated by
simulation experiments. Finally, we apply the proposed TSRGA in a financial
application that leverages unstructured data from the 10-K reports,
demonstrating its usefulness in applications with many dense large-dimensional
matrices.
( 2
min )
Chronic stress can significantly affect physical and mental health. The
advent of wearable technology allows for the tracking of physiological signals,
potentially leading to innovative stress prediction and intervention methods.
However, challenges such as label scarcity and data heterogeneity render stress
prediction difficult in practice. To counter these issues, we have developed a
multimodal personalized stress prediction system using wearable biosignal data.
We employ self-supervised learning (SSL) to pre-train the models on each
subject's data, allowing the models to learn the baseline dynamics of the
participant's biosignals prior to fine-tuning the stress prediction task. We
test our model on the Wearable Stress and Affect Detection (WESAD) dataset,
demonstrating that our SSL models outperform non-SSL models while utilizing
less than 5% of the annotations. These results suggest that our approach can
personalize stress prediction to each user with minimal annotations. This
paradigm has the potential to enable personalized prediction of a variety of
recurring health events using complex multimodal data streams.
( 2
min )
We envision a warehouse in which dozens of mobile robots and human pickers
work together to collect and deliver items within the warehouse. The
fundamental problem we tackle, called the order-picking problem, is how these
worker agents must coordinate their movement and actions in the warehouse to
maximise performance (e.g. order throughput). Established industry methods
using heuristic approaches require large engineering efforts to optimise for
innately variable warehouse configurations. In contrast, multi-agent
reinforcement learning (MARL) can be flexibly applied to diverse warehouse
configurations (e.g. size, layout, number/types of workers, item replenishment
frequency), as the agents learn through experience how to optimally cooperate
with one another. We develop hierarchical MARL algorithms in which a manager
assigns goals to worker agents, and the policies of the manager and workers are
co-trained toward maximising a global objective (e.g. pick rate). Our
hierarchical algorithms achieve significant gains in sample efficiency and
overall pick rates over baseline MARL algorithms in diverse warehouse
configurations, and substantially outperform two established industry
heuristics for order-picking systems.
( 2
min )
Transfer learning leverages feature representations of deep neural networks
(DNNs) pretrained on source tasks with rich data to empower effective
finetuning on downstream tasks. However, the pretrained models are often
prohibitively large for delivering generalizable representations, which limits
their deployment on edge devices with constrained resources. To close this gap,
we propose a new transfer learning pipeline, which leverages our finding that
robust tickets can transfer better, i.e., subnetworks drawn with properly
induced adversarial robustness can win better transferability over vanilla
lottery ticket subnetworks. Extensive experiments and ablation studies validate
that our proposed transfer learning pipeline can achieve enhanced
accuracy-sparsity trade-offs across both diverse downstream tasks and sparsity
patterns, further enriching the lottery ticket hypothesis.
( 2
min )
Ensuring the trustworthiness and interpretability of machine learning models
is critical to their deployment in real-world applications. Feature attribution
methods have gained significant attention, which provide local explanations of
model predictions by attributing importance to individual input features. This
study examines the generalization of feature attributions across various deep
learning architectures, such as convolutional neural networks (CNNs) and vision
transformers. We aim to assess the feasibility of utilizing a feature
attribution method as a future detector and examine how these features can be
harmonized across multiple models employing distinct architectures but trained
on the same data distribution. By exploring this harmonization, we aim to
develop a more coherent and optimistic understanding of feature attributions,
enhancing the consistency of local explanations across diverse deep-learning
models. Our findings highlight the potential for harmonized feature attribution
methods to improve interpretability and foster trust in machine learning
applications, regardless of the underlying architecture.
( 2
min )
How do language models "think"? This paper formulates a probabilistic
cognitive model called the bounded pragmatic speaker, which can characterize
the operation of different variations of language models. Specifically, we
demonstrate that large language models fine-tuned with reinforcement learning
from human feedback (Ouyang et al., 2022) embody a model of thought that
conceptually resembles a fast-and-slow model (Kahneman, 2011), which
psychologists have attributed to humans. We discuss the limitations of
reinforcement learning from human feedback as a fast-and-slow model of thought
and propose avenues for expanding this framework. In essence, our research
highlights the value of adopting a cognitive probabilistic modeling approach to
gain insights into the comprehension, evaluation, and advancement of language
models.
( 2
min )
Complex diseases are caused by a multitude of factors that may differ between
patients even within the same diagnostic category. A few underlying root causes
may nevertheless initiate the development of disease within each patient. We
therefore focus on identifying patient-specific root causes of disease, which
we equate to the sample-specific predictivity of the exogenous error terms in a
structural equation model. We generalize from the linear setting to the
heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with
non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean
and mean absolute deviation, respectively. This model preserves identifiability
but introduces non-trivial challenges that require a customized algorithm
called Generalized Root Causal Inference (GRCI) to extract the error terms
correctly. GRCI recovers patient-specific root causes more accurately than
existing alternatives.
( 2
min )
Federated learning methods enable model training across distributed data
sources without data leaving their original locations and have gained
increasing interest in various fields. However, existing approaches are
limited, excluding many structured probabilistic models. We present a general
and elegant solution based on structured variational inference, widely used in
Bayesian machine learning, adapted for the federated setting. Additionally, we
provide a communication-efficient variant analogous to the canonical FedAvg
algorithm. The proposed algorithms' effectiveness is demonstrated, and their
performance is compared with hierarchical Bayesian neural networks and topic
models.
( 2
min )
In this paper, we investigate the training process of generative networks
that use a type of probability density distance named particle-based distance
as the objective function, e.g. MMD GAN, Cram\'er GAN, EIEG GAN. However, these
GANs often suffer from the problem of unstable training. In this paper, we
analyze the stability of the training process of these GANs from the
perspective of probability density dynamics. In our framework, we regard the
discriminator $D$ in these GANs as a feature transformation mapping that maps
high dimensional data into a feature space, while the generator $G$ maps random
variables to samples that resemble real data in terms of feature space. This
perspective enables us to perform stability analysis for the training of GANs
using the Wasserstein gradient flow of the probability density function. We
find that the training process of the discriminator is usually unstable due to
the formulation of $\min_G \max_D E(G, D)$ in GANs. To address this issue, we
add a stabilizing term in the discriminator loss function. We conduct
experiments to validate our stability analysis and stabilizing method.
( 2
min )
Despite of achieving great success in real-world applications, Deep
Reinforcement Learning (DRL) is still suffering from three critical issues,
i.e., data efficiency, lack of the interpretability and transferability. Recent
research shows that embedding symbolic knowledge into DRL is promising in
addressing those challenges. Inspired by this, we introduce a novel deep
reinforcement learning framework with symbolic options. Our framework features
a loop training procedure, which enables guiding the improvement of policy by
planning with planning models (including action models and hierarchical task
network models) and symbolic options learned from interactive trajectories
automatically. The learned symbolic options alleviate the dense requirement of
expert domain knowledge and provide inherent interpretability of policies.
Moreover, the transferability and data efficiency can be further improved by
planning with the symbolic planning models. To validate the effectiveness of
our framework, we conduct experiments on two domains, Montezuma's Revenge and
Office World, respectively. The results demonstrate the comparable performance,
improved data efficiency, interpretability and transferability.
( 2
min )
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and
adaptively provides high-quality intrinsic rewards to enhance exploration in
reinforcement learning (RL). More specifically, AIRS selects shaping function
from a predefined set based on the estimated task return in real-time,
providing reliable exploration incentives and alleviating the biased objective
problem. Moreover, we develop an intrinsic reward toolkit to provide efficient
and reliable implementations of diverse intrinsic reward approaches. We test
AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite.
Extensive simulation demonstrates that AIRS can outperform the benchmarking
schemes and achieve superior performance with simple architecture.
( 2
min )
In this work, we focus on the communication aspect of decentralized learning,
which involves multiple agents training a shared machine learning model using
decentralized stochastic gradient descent (D-SGD) over distributed data. In
particular, we investigate the impact of broadcast transmission and
probabilistic random access policy on the convergence performance of D-SGD,
considering the broadcast nature of wireless channels and the link dynamics in
the communication topology. Our results demonstrate that optimizing the access
probability to maximize the expected number of successful links is a highly
effective strategy for accelerating the system convergence.
( 2
min )
We introduce Breadth-First Pipeline Parallelism, a novel training schedule
which optimizes the combination of pipeline and data parallelism. Breadth-First
Pipeline Parallelism lowers training time, cost and memory usage by combining a
high GPU utilization with a small batch size per GPU, and by making use of
fully sharded data parallelism. Experimentally, we observed an increase of up
to 43% in training throughput for a 52 billion-parameter model using a small
batch size per GPU compared to Megatron-LM, which would reduce the training
time and cost by the same amount on a large GPU cluster.
( 2
min )
Animating still face images with deep generative models using a speech input
signal is an active research topic and has seen important recent progress.
However, much of the effort has been put into lip syncing and rendering quality
while the generation of natural head motion, let alone the audio-visual
correlation between head motion and speech, has often been neglected. In this
work, we propose a multi-scale audio-visual synchrony loss and a multi-scale
autoregressive GAN to better handle short and long-term correlation between
speech and the dynamics of the head and lips. In particular, we train a stack
of syncer models on multimodal input pyramids and use these models as guidance
in a multi-scale generator network to produce audio-aligned motion unfolding
over diverse time scales. Our generator operates in the facial landmark domain,
which is a standard low-dimensional head representation. The experiments show
significant improvements over the state of the art in head motion dynamics
quality and in multi-scale audio-visual synchrony both in the landmark domain
and in the image domain.
( 2
min )
Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy
properties. This has introduced application workloads that comprise of multiple
DNN applications, raising new challenges regarding workload distribution.
Equipped with a diverse set of accelerators, newer embedded system present
architectural heterogeneity, which current run-time controllers are unable to
fully utilize. To enable high throughput in multi-DNN workloads, such a
controller is ought to explore hundreds of thousands of possible solutions to
exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a
lightweight and extensible multi-DNN manager for heterogeneous embedded
devices. We leverage stochastic space exploration and we combine it with a
highly accurate performance estimator to observe a x4.6 average throughput
boost compared to other state-of-the-art methods. The evaluation was performed
on the HiKey970 development board.
( 2
min )
The mathematical representations of data in the Spherical Harmonic (SH)
domain has recently regained increasing interest in the machine learning
community. This technical report gives an in-depth introduction to the
theoretical foundation and practical implementation of SH representations,
summarizing works on rotation invariant and equivariant features, as well as
convolutions and exact correlations of signals on spheres. In extension, these
methods are then generalized from scalar SH representations to Vectorial
Harmonics (VH), providing the same capabilities for 3d vector fields on spheres
( 2
min )
Feature-distributed data, referred to data partitioned by features and stored
across multiple computing nodes, are increasingly common in applications with a
large number of features. This paper proposes a two-stage relaxed greedy
algorithm (TSRGA) for applying multivariate linear regression to such data. The
main advantage of TSRGA is that its communication complexity does not depend on
the feature dimension, making it highly scalable to very large data sets. In
addition, for multivariate response variables, TSRGA can be used to yield
low-rank coefficient estimates. The fast convergence of TSRGA is validated by
simulation experiments. Finally, we apply the proposed TSRGA in a financial
application that leverages unstructured data from the 10-K reports,
demonstrating its usefulness in applications with many dense large-dimensional
matrices.
( 2
min )
Complex diseases are caused by a multitude of factors that may differ between
patients even within the same diagnostic category. A few underlying root causes
may nevertheless initiate the development of disease within each patient. We
therefore focus on identifying patient-specific root causes of disease, which
we equate to the sample-specific predictivity of the exogenous error terms in a
structural equation model. We generalize from the linear setting to the
heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with
non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean
and mean absolute deviation, respectively. This model preserves identifiability
but introduces non-trivial challenges that require a customized algorithm
called Generalized Root Causal Inference (GRCI) to extract the error terms
correctly. GRCI recovers patient-specific root causes more accurately than
existing alternatives.
( 2
min )
There is substantial empirical evidence about the success of dynamic
implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler
(NUTS), in many challenging inference problems but theoretical results about
their behavior are scarce. The aim of this paper is to fill this gap. More
precisely, we consider a general class of MCMC algorithms we call dynamic HMC.
We show that this general framework encompasses NUTS as a particular case,
implying the invariance of the target distribution as a by-product. Second, we
establish conditions under which NUTS is irreducible and aperiodic and as a
corrolary ergodic. Under conditions similar to the ones existing for HMC, we
also show that NUTS is geometrically ergodic. Finally, we improve existing
convergence results for HMC showing that this method is ergodic without any
boundedness condition on the stepsize and the number of leapfrog steps, in the
case where the target is a perturbation of a Gaussian distribution.
( 2
min )
We investigate enhancing the sensitivity of new physics searches at the LHC
by machine learning in the case of background dominance and a high degree of
overlap between the observables for signal and background. We use two different
models, XGBoost and a deep neural network, to exploit correlations between
observables and compare this approach to the traditional cut-and-count method.
We consider different methods to analyze the models' output, finding that a
template fit generally performs better than a simple cut. By means of a Shapley
decomposition, we gain additional insight into the relationship between event
kinematics and the machine learning model output. We consider a supersymmetric
scenario with a metastable sneutrino as a concrete example, but the methodology
can be applied to a much wider class of models.
( 2
min )
We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound
(BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary
environments. This unique combination of Bayesian and frequentist principles
enhances adaptability and performance in dynamic settings. The BOF-UCB
algorithm utilizes sequential Bayesian updates to infer the posterior
distribution of the unknown regression parameter, and subsequently employs a
frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing
the expected reward over the posterior distribution. We provide theoretical
guarantees of BOF-UCB's performance and demonstrate its effectiveness in
balancing exploration and exploitation on synthetic datasets and classical
control tasks in a reinforcement learning setting. Our results show that
BOF-UCB outperforms existing methods, making it a promising solution for
sequential decision-making in non-stationary environments.
( 2
min )
ChatGPT is a large language model developed by OpenAI. Despite its impressive
performance across various tasks, no prior work has investigated its capability
in the biomedical domain yet. To this end, this paper aims to evaluate the
performance of ChatGPT on various benchmark biomedical tasks, such as relation
extraction, document classification, question answering, and summarization. To
the best of our knowledge, this is the first work that conducts an extensive
evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on
our evaluation that in biomedical datasets that have smaller training sets,
zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative
transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's
pre-training on large text corpora makes it quite specialized even in the
biomedical domain. Our findings demonstrate that ChatGPT has the potential to
be a valuable tool for various tasks in the biomedical domain that lack large
annotated data.
( 2
min )
Distributional Graphormer, Microsoft’s new deep learning framework for predicting the equilibrium distribution of molecular structures, can generate realistic and diverse molecular structures with high efficiency and low cost.
The post Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems appeared first on Microsoft Research.
( 14
min )
This AI system only needs a small amount of data to predict molecular properties, which could speed up drug discovery and material development.
( 9
min )
One of the essential components of deep learning is the choice of the loss
function and performance metrics used to train and evaluate models. This paper
reviews the most prevalent loss functions and performance measurements in deep
learning. We examine the benefits and limits of each technique and illustrate
their application to various deep-learning problems. Our review aims to give a
comprehensive picture of the different loss functions and performance
indicators used in the most common deep learning tasks and help practitioners
choose the best method for their specific task.
( 2
min )
Human language acquisition is an efficient, supervised, and continual
process. In this work, we took inspiration from how human babies acquire their
first language, and developed a computational process for word acquisition
through comparative learning. Motivated by cognitive findings, we generated a
small dataset that enables the computation models to compare the similarities
and differences of various attributes, learn to filter out and extract the
common information for each shared linguistic label. We frame the acquisition
of words as not only the information filtration process, but also as
representation-symbol mapping. This procedure does not involve a fixed
vocabulary size, nor a discriminative objective, and allows the models to
continually learn more concepts efficiently. Our results in controlled
experiments have shown the potential of this approach for efficient continual
learning of grounded words.
( 2
min )
This paper introduces Track Mix, a personalized playlist generation system
released in 2022 on the music streaming service Deezer. Track Mix automatically
generates "mix" playlists inspired by initial music tracks, allowing users to
discover music similar to their favorite content. To generate these mixes, we
consider a Transformer model trained on millions of track sequences from user
playlists. In light of the growing popularity of Transformers in recent years,
we analyze the advantages, drawbacks, and technical challenges of using such a
model for mix generation on the service, compared to a more traditional
collaborative filtering approach. Since its release, Track Mix has been
generating playlists for millions of users daily, enhancing their music
discovery experience on Deezer.
( 2
min )
Over the course of the past two decades, a substantial body of research has
substantiated the viability of utilising cardiac signals as a biometric
modality. This paper presents a novel approach for patient identification in
healthcare systems using electrocardiogram signals. A convolutional neural
network is used to classify users based on images extracted from ECG signals.
The proposed identification system is evaluated in multiple databases,
providing a comprehensive understanding of its potential in real-world
scenarios. The impact of Cardiovascular Diseases on generic user identification
has been largely overlooked in previous studies. The presented method takes
into account the cardiovascular condition of the patients, ensuring that the
results obtained are not biased or limited. Furthermore, the results obtained
are consistent and reliable, with lower error rates and higher accuracy
metrics, as demonstrated through extensive experimentation. All these features
make the proposed method a valuable contribution to the field of patient
identification in healthcare systems, and make it a strong contender for
practical applications.
( 2
min )
Coding problems are problems that require a solution in the form of a
computer program. Coding problems are popular among students and professionals
as it enhances their skills and career opportunities. An AI system that would
help those who practice coding problems would be highly useful and there is a
huge potential for such a system. In this work, we propose a model which uses
stacking of hyperparameter tuned boosting models to achieve impressive metric
scores of 77.8% accuracy and 0.815 PR-AUC on the dataset that was scraped from
Codeforces and Leetcode. We open source the dataset and the models developed
for this work.
( 2
min )
In this paper, we propose a novel tag-based recommender system called PLIERS,
which relies on the assumption that users are mainly interested in items and
tags with similar popularity to those they already own. PLIERS is aimed at
reaching a good tradeoff between algorithmic complexity and the level of
personalization of recommended items. To evaluate PLIERS, we performed a set of
experiments on real OSN datasets, demonstrating that it outperforms
state-of-the-art solutions in terms of personalization, relevance, and novelty
of recommendations.
( 2
min )
We propose a kernel-spectral embedding algorithm for learning low-dimensional
nonlinear structures from high-dimensional and noisy observations, where the
datasets are assumed to be sampled from an intrinsically low-dimensional
manifold and corrupted by high-dimensional noise. The algorithm employs an
adaptive bandwidth selection procedure which does not rely on prior knowledge
of the underlying manifold. The obtained low-dimensional embeddings can be
further utilized for downstream purposes such as data visualization, clustering
and prediction. Our method is theoretically justified and practically
interpretable. Specifically, we establish the convergence of the final
embeddings to their noiseless counterparts when the dimension and size of the
samples are comparably large, and characterize the effect of the
signal-to-noise ratio on the rate of convergence and phase transition. We also
prove convergence of the embeddings to the eigenfunctions of an integral
operator defined by the kernel map of some reproducing kernel Hilbert space
capturing the underlying nonlinear structures. Numerical simulations and
analysis of three real datasets show the superior empirical performance of the
proposed method, compared to many existing methods, on learning various
manifolds in diverse applications.
( 2
min )
We provide the first finite-particle convergence rate for Stein variational
gradient descent (SVGD), a popular algorithm for approximating a probability
distribution with a collection of particles. Specifically, whenever the target
distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and
an appropriate step size sequence drives the kernel Stein discrepancy to zero
at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be
improved, and we hope that our explicit, non-asymptotic proof strategy will
serve as a template for future refinements.
( 2
min )
In this paper, we consider a general observation model for restless
multi-armed bandit problems. The operation of the player needs to be based on
certain feedback mechanism that is error-prone due to resource constraints or
environmental or intrinsic noises. By establishing a general probabilistic
model for dynamics of feedback/observation, we formulate the problem as a
restless bandit with a countable belief state space starting from an arbitrary
initial belief (a priori information). We apply the achievable region method
with partial conservation law (PCL) to the infinite-state problem and analyze
its indexability and priority index (Whittle index). Finally, we propose an
approximation process to transform the problem into which the AG algorithm of
Ni\~no-Mora and Bertsimas for finite-state problems can be applied to.
Numerical experiments show that our algorithm has an excellent performance.
( 2
min )
The paper uses structured machine learning regressions for nowcasting with
panel data consisting of series sampled at different frequencies. Motivated by
the problem of predicting corporate earnings for a large cross-section of firms
with macroeconomic, financial, and news time series sampled at different
frequencies, we focus on the sparse-group LASSO regularization which can take
advantage of the mixed frequency time series panel data structures. Our
empirical results show the superior performance of our machine learning panel
data regression models over analysts' predictions, forecast combinations,
firm-specific time series regression models, and standard machine learning
methods.
( 2
min )
Gaussian Processes (GPs) offer an attractive method for regression over
small, structured and correlated datasets. However, their deployment is
hindered by computational costs and limited guidelines on how to apply GPs
beyond simple low-dimensional datasets. We propose a framework to identify the
suitability of GPs to a given problem and how to set up a robust and
well-specified GP model. The guidelines formalise the decisions of experienced
GP practitioners, with an emphasis on kernel design and options for
computational scalability. The framework is then applied to a case study of
glacier elevation change yielding more accurate results at test time.
( 2
min )
BioAutoMATED, an open-source, automated machine-learning platform, aims to help democratize artificial intelligence for research labs.
( 8
min )
Amazon SageMaker is an end-to-end machine learning (ML) platform with wide-ranging features to ingest, transform, and measure bias in data, and train, deploy, and manage models in production with best-in-class compute and services such as Amazon SageMaker Data Wrangler, Amazon SageMaker Studio, Amazon SageMaker Canvas, Amazon SageMaker Model Registry, Amazon SageMaker Feature Store, Amazon SageMaker […]
( 8
min )
A diverse range of artists, fashionistas, musicians and the cinematic arts inspired the creative journey of Pedro Soares, aka Blendeered, and helped him fall in love with using 3D to create art.
( 7
min )
GFN Thursday arrives alongside the sweet Steam Summer Sale — with hundreds of PC games playable on GeForce NOW available during Valve’s special event for PC gamers. Also on sale, OCTOPATH TRAVELER and OCTOPATH TRAVELER II join the GeForce NOW library as a part of five new games coming to the service this week. Saved Read article >
( 6
min )
A global team of medical providers is leveraging Holoportation, a Microsoft 3D capture and communication technology, to widen access to specialized care. Computer engineer Spencer Fowers and plastic surgeon Kwame Darko discuss the collaboration.
The post Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko appeared first on Microsoft Research.
( 32
min )
No content preview
( 2
min )
GPT-3.5 Turbo, DALL·E and Whisper APIs are also generally available, and we are releasing a deprecation plan for older models of the Completions API, which will retire at the beginning of 2024.
( 5
min )
Amazon Polly is a service that turns text into lifelike speech. It enables the development of a whole class of applications that can convert text into speech in multiple languages. This service can be used by chatbots, audio books, and other text-to-speech applications in conjunction with other AWS AI or machine learning (ML) services. For […]
( 9
min )
Predictive maintenance is critical in automotive industries because it can avoid out-of-the-blue mechanical failures and reactive maintenance activities that disrupt operations. By predicting vehicle failures and scheduling maintenance and repairs, you’ll reduce downtime, improve safety, and boost productivity levels. What if we could apply deep learning techniques to common areas that drive vehicle failures, unplanned […]
( 11
min )
In the dynamic world of the Internet of Things (IoT), data integration plays a crucial role in harnessing the full potential of connected devices. By seamlessly combining data from diverse sources, data integration enables organizations to unlock valuable insights, optimize operations, and make informed decisions. This blog will explore the significance of data integration in… Read More »Data integration in IoT environments: Enhancing connectivity and insights
The post Data integration in IoT environments: Enhancing connectivity and insights appeared first on Data Science Central.
( 21
min )
This is the first in a series of articles based on interviews with Intel technology leaders about AI/HPC acceleration.
The post Ushering in the 5th epoch of distributed computing with accelerated AI technologies appeared first on Data Science Central.
( 34
min )
In the movie the Lord of the Rings – the wizard Sauron says that “The hour is later than you think” I was reminded of this phrase when I read a report from McKinsey The economic potential of generative A There are some key findings on the future of AI which shows you how fast… Read More »The hour is later than you think for AI impacting our jobs
The post The hour is later than you think for AI impacting our jobs appeared first on Data Science Central.
( 19
min )
As Web3 evolves and transforms into a more decentralized and user-centric ecosystem, the role of artificial intelligence or AI cannot be understated. By leveraging its capabilities, AI is contributing to various aspects of the Web3 landscape, such as managing data, executing contracts, generating insights, securing identities, curating content, governing organizations, and enhancing user experiences. An… Read More »Role of AI in Web3: Ensuring seamless content moderation for dating websites
The post Role of AI in Web3: Ensuring seamless content moderation for dating websites appeared first on Data Science Central.
( 23
min )
Three leading European generative AI startups joined NVIDIA founder and CEO Jensen Huang this week to talk about the new era of computing. More than 500 developers, researchers, entrepreneurs and executives from across Europe and further afield packed into the Spindler and Klatt, a sleek, riverside gathering spot in Berlin. Huang started the reception by Read article >
( 6
min )
China electric vehicle maker XPENG Motors has announced its new G6 coupe SUV — featuring an NVIDIA-powered intelligent advanced driver assistance system — is now available to the China market. The G6 is XPENG’s first model featuring the company’s proprietary Smart Electric Platform Architecture (SEPA) 2.0, which aims to reduce development and manufacturing costs and Read article >
( 5
min )
The increased frequency and severity of extreme weather and climate events could take a million lives and cost $1.7 trillion annually by 2050, according to the Munich Reinsurance Company. This underscores a critical need for accurate weather forecasting, especially with the rise in severe weather occurrences such as blizzards, hurricanes and heatwaves. AI and accelerated Read article >
( 6
min )
AI and accelerated computing will help climate researchers achieve the miracles they need to achieve breakthroughs in climate research, NVIDIA founder and CEO Jensen Huang said during a keynote Monday at the Berlin Summit for the Earth Virtualization Engines initiative. “Richard Feynman once said that ‘what I can’t create, I don’t understand’ and that’s the Read article >
( 6
min )
Companies across various industries create, scan, and store large volumes of PDF documents. In many cases, the content is text-heavy and often written in a different language and requires translation. To address this, you need an automated solution to extract the contents within these PDFs and translate them quickly and cost-efficiently. Many businesses have diverse […]
( 8
min )
In computer vision (CV), adding tags to identify objects of interest or bounding boxes to locate the objects is called labeling. It’s one of the prerequisite tasks to prepare training data to train a deep learning model. Hundreds of thousands of work hours are spent generating high-quality labels from images and videos for various CV […]
( 9
min )
We focus on the problem of market making in high-frequency trading. Market
making is a critical function in financial markets that involves providing
liquidity by buying and selling assets. However, the increasing complexity of
financial markets and the high volume of data generated by tick-level trading
makes it challenging to develop effective market making strategies. To address
this challenge, we propose a deep reinforcement learning approach that fuses
tick-level data with periodic prediction signals to develop a more accurate and
robust market making strategy. Our results of market making strategies based on
different deep reinforcement learning algorithms under the simulation scenarios
and real data experiments in the cryptocurrency markets show that the proposed
framework outperforms existing methods in terms of profitability and risk
management.
( 2
min )
Biological data may be separated into primary data, such as gene expression,
and secondary data, such as pathways and protein-protein interactions. Methods
using secondary data to enhance the analysis of primary data are promising,
because secondary data have background information that is not included in
primary data. In this study, we proposed an end-to-end framework to integrally
handle secondary data to construct a classification model for primary data. We
applied this framework to cancer prognosis prediction using gene expression
data and a biological network. Cross-validation results indicated that our
model achieved higher accuracy compared with a deep neural network model
without background biological network information. Experiments conducted in
patient groups by cancer type showed improvement in ROC-area under the curve
for many groups. Visualizations of high accuracy cancer types identified
contributing genes and pathways by enrichment analysis. Known biomarkers and
novel biomarker candidates were identified through these experiments.
( 2
min )
Recently, deep learning has revolutionized the field of natural language
processing, with neural language models proving to be very effective for
next-word prediction. However, a rigorous theoretical explanation for their
success in the context of formal language theory has not yet been developed, as
it is unclear why neural language models can learn the combinatorial rules that
govern the next-word prediction task. In this paper, we study a class of formal
languages that can be used to model real-world examples of English sentences.
We construct neural language models can solve the next-word prediction task in
this context with zero error. Our proof highlights the different roles of the
embedding layer and the fully connected component within the neural language
model.
( 2
min )
We examine the characteristic activation values of individual ReLU units in
neural networks. We refer to the corresponding set for such characteristic
activation values in the input space as the characteristic activation set of a
ReLU unit. We draw an explicit connection between the characteristic activation
set and learned features in ReLU networks. This connection leads to new
insights into why various neural network normalization techniques used in
modern deep learning architectures regularize and stabilize SGD optimization.
Utilizing these insights, we propose a geometric approach to parameterize ReLU
networks for improved feature learning. We empirically verify its usefulness
with less carefully chosen initialization schemes and larger learning rates. We
report improved optimization stability, faster convergence speed, and better
generalization performance.
( 2
min )
Practical density functional theory (DFT) owes its success to the
groundbreaking work of Kohn and Sham that introduced the exact calculation of
the non-interacting kinetic energy of the electrons using an auxiliary
mean-field system. However, the full power of DFT will not be unleashed until
the exact relationship between the electron density and the non-interacting
kinetic energy is found. Various attempts have been made to approximate this
functional, similar to the exchange--correlation functional, with much less
success due to the larger contribution of kinetic energy and its more non-local
nature. In this work we propose a new and efficient regularization method to
train density functionals based on deep neural networks, with particular
interest in the kinetic-energy functional. The method is tested on
(effectively) one-dimensional systems, including the hydrogen chain,
non-interacting electrons, and atoms of the first two periods, with excellent
results. For the atomic systems, the generalizability of the regularization
method is demonstrated by training also an exchange--correlation functional,
and the contrasting nature of the two functionals is discussed from a
machine-learning perspective.
( 2
min )
A Bayesian filtering algorithm is developed for a class of state-space
systems that can be modelled via Gaussian mixtures. In general, the exact
solution to this filtering problem involves an exponential growth in the number
of mixture terms and this is handled here by utilising a Gaussian mixture
reduction step after both the time and measurement updates. In addition, a
square-root implementation of the unified algorithm is presented and this
algorithm is profiled on several simulated systems. This includes the state
estimation for two non-linear systems that are strictly outside the class
considered in this paper.
( 2
min )
Recently, a new class of non-convex optimization problems motivated by the
statistical problem of learning an acyclic directed graphical model from data
has attracted significant interest. While existing work uses standard
first-order optimization schemes to solve this problem, proving the global
optimality of such approaches has proven elusive. The difficulty lies in the
fact that unlike other non-convex problems in the literature, this problem is
not "benign", and possesses multiple spurious solutions that standard
approaches can easily get trapped in. In this paper, we prove that a simple
path-following optimization scheme globally converges to the global minimum of
the population loss in the bivariate setting.
( 2
min )
In this work, a comprehensive numerical study involving analysis and
experiments shows why a two-layer neural network has difficulties handling high
frequencies in approximation and learning when machine precision and
computation cost are important factors in real practice. In particular, the
following fundamental computational issues are investigated: (1) the best
accuracy one can achieve given a finite machine precision, (2) the computation
cost to achieve a given accuracy, and (3) stability with respect to
perturbations. The key to the study is the spectral analysis of the
corresponding Gram matrix of the activation functions which also shows how the
properties of the activation function play a role in the picture.
( 2
min )
The Rashomon Effect describes the following phenomenon: for a given dataset
there may exist many models with equally good performance but with different
solution strategies. The Rashomon Effect has implications for Explainable
Machine Learning, especially for the comparability of explanations. We provide
a unified view on three different comparison scenarios and conduct a
quantitative evaluation across different datasets, models, attribution methods,
and metrics. We find that hyperparameter-tuning plays a role and that metric
selection matters. Our results provide empirical support for previously
anecdotal evidence and exhibit challenges for both scientists and
practitioners.
( 2
min )
Tensor network (TN) representation is a powerful technique for data analysis
and machine learning. It practically involves a challenging TN structure search
(TN-SS) problem, which aims to search for the optimal structure to achieve a
compact representation. Existing TN-SS methods mainly adopt a bi-level
optimization method that leads to excessive computational costs due to repeated
structure evaluations. To address this issue, we propose an efficient
integrated (single-level) method named SVD-inspired TN decomposition
(SVDinsTN), eliminating the need for repeated tedious structure evaluation. By
inserting a diagonal factor for each edge of the fully-connected TN, we
calculate TN cores and diagonal factors simultaneously, with factor sparsity
revealing the most compact TN structure. Experimental results on real-world
data demonstrate that SVDinsTN achieves approximately $10\sim{}10^3$ times
acceleration in runtime compared to the existing TN-SS methods while
maintaining a comparable level of representation ability.
( 2
min )
We study the problem of learning mixtures of Gaussians with censored data.
Statistical learning with censored data is a classical problem, with numerous
practical applications, however, finite-sample guarantees for even simple
latent variable models such as Gaussian mixtures are missing. Formally, we are
given censored data from a mixture of univariate Gaussians $$
\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed
only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and
the means $\mu_i$. We propose an algorithm that takes only
$\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the
means $\mu_i$ within $\varepsilon$ error.
( 2
min )
Energy time-series analysis describes the process of analyzing past energy
observations and possibly external factors so as to predict the future.
Different tasks are involved in the general field of energy time-series
analysis and forecasting, with electric load demand forecasting, personalized
energy consumption forecasting, as well as renewable energy generation
forecasting being among the most common ones. Following the exceptional
performance of Deep Learning (DL) in a broad area of vision tasks, DL models
have successfully been utilized in time-series forecasting tasks. This paper
aims to provide insight into various DL methods geared towards improving the
performance in energy time-series forecasting tasks, with special emphasis in
Greek Energy Market, and equip the reader with the necessary knowledge to apply
these methods in practice.
( 2
min )
This research paper focuses on the implementation of radial Basis Function
(RBF) Support Vector Machines (SVM) for classifying asteroid orbits. Asteroids
are important astronomical objects, and their orbits play a crucial role in
understanding the dynamics of the solar system. The International Astronomical
Union maintains data archives that provide a playground to experiment with
various machine-learning techniques. In this study, we explore the application
of RBF SVM algorithm to classify asteroids. The results show that the RBF SVM
algorithm provides a good efficiency and accuracy to the dataset. We also
analyze the impact of various parameters on the performance of the RBF SVM
algorithm and present the optimal parameter settings. Our study highlights the
importance of using machine learning techniques for classifying asteroid orbits
and the effectiveness of the RBF SVM algorithm in this regard.
( 2
min )
Medical image segmentation is particularly critical as a prerequisite for
relevant quantitative analysis in the treatment of clinical diseases. For
example, in clinical cervical cancer radiotherapy, after acquiring subabdominal
MRI images, a fast and accurate image segmentation of organs and tumors in MRI
images can optimize the clinical radiotherapy process, whereas traditional
approaches use manual annotation by specialist doctors, which is time-consuming
and laborious, therefore, automatic organ segmentation of subabdominal MRI
images is a valuable research topic.
( 2
min )
Despite the recent development in machine learning, most learning systems are
still under the concept of "black box", where the performance cannot be
understood and derived. With the rise of safety and privacy concerns in public,
designing an explainable learning system has become a new trend in machine
learning. In general, many machine learning problems are formulated as
minimizing (or maximizing) some loss function. Since real data are most likely
generated from non-linear models, the loss function is non-convex in general.
Unlike the convex optimization problem, gradient descent algorithms will be
trapped in spurious local minima in solving non-convex optimization. Therefore,
it is challenging to provide explainable algorithms when studying non-convex
optimization problems. In this thesis, two popular non-convex problems are
studied: (1) low-rank matrix completion and (2) neural network learning.
( 2
min )
A dynamical system produces a dependent multivariate sequence called
dynamical time series, developed with an evolution function. As variables in
the dynamical time series at the current time-point usually depend on the whole
variables in the previous time-point, existing studies forecast the variables
at the future time-point by estimating the evolution function. However, some
variables in the dynamical time-series are missing in some practical
situations. In this study, we propose an autoregressive with slack time series
(ARS) model. ARS model involves the simultaneous estimation of the evolution
function and the underlying missing variables as a slack time series, with the
aid of the time-invariance and linearity of the dynamical system. This study
empirically demonstrates the effectiveness of the proposed ARS model.
( 2
min )
In this paper we demonstrate both theoretically as well as numerically that
neural networks can detect model-free static arbitrage opportunities whenever
the market admits some. Due to the use of neural networks, our method can be
applied to financial markets with a high number of traded securities and
ensures almost immediate execution of the corresponding trading strategies. To
demonstrate its tractability, effectiveness, and robustness we provide examples
using real financial data. From a technical point of view, we prove that a
single neural network can approximately solve a class of convex semi-infinite
programs, which is the key result in order to derive our theoretical results
that neural networks can detect model-free static arbitrage strategies whenever
the financial market admits such opportunities.
( 2
min )
Machine learning (ML) and tensor-based methods have been of significant
interest for the scientific community for the last few decades. In a previous
work we presented a novel tensor-based system identification framework to ease
the computational burden of tensor-only architectures while still being able to
achieve exceptionally good performance. However, the derived approach only
allows to process real-valued problems and is therefore not directly applicable
on a wide range of signal processing and communications problems, which often
deal with complex-valued systems. In this work we therefore derive two new
architectures to allow the processing of complex-valued signals, and show that
these extensions are able to surpass the trivial, complex-valued extension of
the original architecture in terms of performance, while only requiring a
slight overhead in computational resources to allow for complex-valued
operations.
( 2
min )
Numerical data imputation algorithms replace missing values by estimates to
leverage incomplete data sets. Current imputation methods seek to minimize the
error between the unobserved ground truth and the imputed values. But this
strategy can create artifacts leading to poor imputation in the presence of
multimodal or complex distributions. To tackle this problem, we introduce the
$k$NN$\times$KDE algorithm: a data imputation method combining nearest neighbor
estimation ($k$NN) and density estimation with Gaussian kernels (KDE). We
compare our method with previous data imputation methods using artificial and
real-world data with different data missing scenarios and various data missing
rates, and show that our method can cope with complex original data structure,
yields lower data imputation errors, and provides probabilistic estimates with
higher likelihood than current methods. We release the code in open-source for
the community: https://github.com/DeltaFloflo/knnxkde
( 2
min )
A dynamical system produces a dependent multivariate sequence called
dynamical time series, developed with an evolution function. As variables in
the dynamical time series at the current time-point usually depend on the whole
variables in the previous time-point, existing studies forecast the variables
at the future time-point by estimating the evolution function. However, some
variables in the dynamical time-series are missing in some practical
situations. In this study, we propose an autoregressive with slack time series
(ARS) model. ARS model involves the simultaneous estimation of the evolution
function and the underlying missing variables as a slack time series, with the
aid of the time-invariance and linearity of the dynamical system. This study
empirically demonstrates the effectiveness of the proposed ARS model.
( 2
min )
We study the problem of learning mixtures of Gaussians with censored data.
Statistical learning with censored data is a classical problem, with numerous
practical applications, however, finite-sample guarantees for even simple
latent variable models such as Gaussian mixtures are missing. Formally, we are
given censored data from a mixture of univariate Gaussians $$
\sum_{i=1}^k w_i \mathcal{N}(\mu_i,\sigma^2), $$ i.e. the sample is observed
only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and
the means $\mu_i$. We propose an algorithm that takes only
$\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the
means $\mu_i$ within $\varepsilon$ error.
( 2
min )
This paper revisits an adaptation of the random forest algorithm for
Fr\'echet regression, addressing the challenge of regression in the context of
random objects in metric spaces. Recognizing the limitations of previous
approaches, we introduce a new splitting rule that circumvents the
computationally expensive operation of Fr\'echet means by substituting with a
medoid-based approach. We validate this approach by demonstrating its
asymptotic equivalence to Fr\'echet mean-based procedures and establish the
consistency of the associated regression estimator. The paper provides a sound
theoretical framework and a more efficient computational approach to Fr\'echet
regression, broadening its application to non-standard data types and complex
use cases.
( 2
min )
Wasserstein gradient flows on probability measures have found a host of
applications in various optimization problems. They typically arise as the
continuum limit of exchangeable particle systems evolving by some mean-field
interaction involving a gradient-type potential. However, in many problems,
such as in multi-layer neural networks, the so-called particles are edge
weights on large graphs whose nodes are exchangeable. Such large graphs are
known to converge to continuum limits called graphons as their size grow to
infinity. We show that the Euclidean gradient flow of a suitable function of
the edge-weights converges to a novel continuum limit given by a curve on the
space of graphons that can be appropriately described as a gradient flow or,
more technically, a curve of maximal slope. Several natural functions on
graphons, such as homomorphism functions and the scalar entropy, are covered by
our set-up, and the examples have been worked out in detail.
( 2
min )
This paper proposes a multi-object tracking (MOT) algorithm for traffic
monitoring using a drone equipped with optical and thermal cameras. Object
detections on the images are obtained using a neural network for each type of
camera. The cameras are modelled as direction-of-arrival (DOA) sensors. Each
DOA detection follows a von-Mises Fisher distribution, whose mean direction is
obtain by projecting a vehicle position on the ground to the camera. We then
use the trajectory Poisson multi-Bernoulli mixture filter (TPMBM), which is a
Bayesian MOT algorithm, to optimally estimate the set of vehicle trajectories.
We have also developed a parameter estimation algorithm for the measurement
model. We have tested the accuracy of the resulting TPMBM filter in synthetic
and experimental data sets.
( 2
min )
We propose a new nonparametric modeling framework for causal inference when
outcomes depend on how agents are linked in a social or economic network. Such
network interference describes a large literature on treatment spillovers,
social interactions, social learning, information diffusion, disease and
financial contagion, social capital formation, and more. Our approach works by
first characterizing how an agent is linked in the network using the
configuration of other agents and connections nearby as measured by path
distance. The impact of a policy or treatment assignment is then learned by
pooling outcome data across similarly configured agents. We demonstrate the
approach by proposing an asymptotically valid test for the hypothesis of policy
irrelevance/no treatment effects and bounding the mean-squared error of a
k-nearest-neighbor estimator for the average or distributional policy
effect/treatment response.
( 2
min )
We study causal inference and efficient estimation for the expected number of
recurrent events in the presence of a terminal event. We define our estimand as
the vector comprising both the expected number of recurrent events and the
failure survival function evaluated along a sequence of landmark times. We
identify the estimand in the presence of right-censoring and causal selection
as an observed data functional under coarsening at random, derive the
nonparametric efficiency bound, and propose a multiply-robust estimator that
achieves the bound and permits nonparametric estimation of nuisance parameters.
Throughout, no absolute continuity assumption is made on the underlying
probability distributions of failure, censoring, or the observed data.
Additionally, we derive the class of influence functions when the coarsening
distribution is known and review how published estimators may belong to the
class. Along the way, we highlight some interesting inconsistencies in the
causal lifetime analysis literature.
( 2
min )
In this paper we demonstrate both theoretically as well as numerically that
neural networks can detect model-free static arbitrage opportunities whenever
the market admits some. Due to the use of neural networks, our method can be
applied to financial markets with a high number of traded securities and
ensures almost immediate execution of the corresponding trading strategies. To
demonstrate its tractability, effectiveness, and robustness we provide examples
using real financial data. From a technical point of view, we prove that a
single neural network can approximately solve a class of convex semi-infinite
programs, which is the key result in order to derive our theoretical results
that neural networks can detect model-free static arbitrage strategies whenever
the financial market admits such opportunities.
( 2
min )
I've noted before that because AI detectors produce false positives, it's unethical to use them to detect cheating.
Now there's a new study that shows it's even worse. Not only do AI detectors falsely flag human-written text as AI-written, the way in
( 6
min )
Taking adversarial training from this previous article as baseline, this article introduces a new, confidence-calibrated variant of adversarial training that addresses two significant flaws: First, trained with L∞ adversarial examples, adversarial training is not robust against L2 ones. Second, it incurs a significant increase in (clean) test error. Confidence-calibrated adversarial training addresses these problems by encouraging lower confidence on adversarial examples and subsequently rejecting them.
The post Generalizing Adversarial Robustness with Confidence-Calibrated Adversarial Training in PyTorch appeared first on David Stutz.
( 10
min )
Training artificial neural networks with data from real brains can make computer vision more robust.
( 10
min )
A new dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts.
( 10
min )
Cost of poor quality is top of mind for manufacturers. Quality defects increase scrap and rework costs, decrease throughput, and can impact customers and company reputation. Quality inspection on the production line is crucial for maintaining quality standards. In many cases, human visual inspection is used to assess the quality and detect defects, which can […]
( 10
min )
Not long ago, I published an article entitled “The Sound that Data Makes”. The goal was turning data — random noise in this case — into music. The hope was that by “listening” to your data, you could gain a different kind of insights, not conveyed by visualizations or tabular summaries. This article is a… Read More »The music of the Riemann Hypothesis: Sound Generation in Python
The post The music of the Riemann Hypothesis: Sound Generation in Python appeared first on Data Science Central.
( 22
min )
Organizations are continuously investing time and effort in developing intelligent recommendation solutions to serve customized and relevant content to their users. The goals can be many: transform the user experience, generate meaningful interaction, and drive content consumption. Some of these solutions use common machine learning (ML) models built on historical interaction patterns, user demographic attributes, […]
( 10
min )
Fine-tuning large language models (LLMs) allows you to adjust open-source foundational models to achieve improved performance on your domain-specific tasks. In this post, we discuss the advantages of using Amazon SageMaker notebooks to fine-tune state-of-the-art open-source models. We utilize Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support interactive fine-tuning of […]
( 9
min )
Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the […]
The post Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation appeared first on Microsoft Research.
( 10
min )
MUE Studio, founded by 3D artists Minjin Kang and Mijoo Kim, specializes in art direction, photography and 3D design for campaigns and installations.
( 7
min )
It’s a jam-packed July with 14 newly supported titles in the GeForce NOW library, including Remnant II from Gunfire Games and Gearbox Publishing. Need a new adventure? Check out the nine additions streaming from the cloud this week. Plus, the Steam Summer Sale kicks off this week, and many supported titles in the GeForce NOW Read article >
( 5
min )
In the first part of this blog, we discussed how coding could be a collaborative experience using tools like the GitHub Copilot. In the second part, we will explore the impact and significance of collaboration due to GitHub Copilot on the wider developer ecosystem. As we have seen, developers have a specific definition of collaboration… Read More »Will coding be a collaborative experience using GitHub Copilot? – Part two
The post Will coding be a collaborative experience using GitHub Copilot? – Part two appeared first on Data Science Central.
( 21
min )
The world of higher education is undergoing a transformative shift as artificial intelligence (AI) continues to reshape various aspects of our society. From classrooms to career development, the integration of AI and its impact on learning is undeniable. In this article, we will explore the intersection of AI, certification, and higher education, and delve into… Read More »Navigating the future of learning: AI, certification, and higher education
The post Navigating the future of learning: AI, certification, and higher education appeared first on Data Science Central.
( 27
min )
-->
Figure 1: CoarsenConf architecture.
(I) The encoder $q_\phi(z| X, \mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $\mathcal{R}$ , and coarse-grained (CG) conformer $\mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions.
(II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions.
(III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure.
(IV) Given the FG latent vector and the RDKit approximation, the decoder $p_\theta…
( 5
min )
In today’s digital landscape, data privacy, and security have become the most critical concerns for businesses across industries. With the ever-evolving threat of data breaches, unauthorized access, and privacy violation, companies are increasingly seeking innovative ways to protect their digital assets and sensitive information. One such solution that helps businesses significantly safeguard their crucial information… Read More »How decentralized apps can help businesses improve data security and privacy
The post How decentralized apps can help businesses improve data security and privacy appeared first on Data Science Central.
( 20
min )
The benefits, types, and processes of data transformation and how it contributes to data management, integration, and new technologies.
The post Data transformation 101: Process and new technologies appeared first on Data Science Central.
( 22
min )
In the era of big data and AI, harnessing weather data to predict, plan, and optimize various industries has become an indispensable practice. Today, we will delve into the fascinating process of turning this voluminous weather data into actionable insights. By combining cutting-edge technology, analytical models, and industrial applications, we’ll explore how weather data can… Read More »Harnessing the power of weather data: A guide to actionable insights
The post Harnessing the power of weather data: A guide to actionable insights appeared first on Data Science Central.
( 22
min )
MAGE merges the two key tasks of image generation and recognition, typically trained separately, into a single system.
( 8
min )
MIT alumnus’ platform taps the wisdom of crowds to label medical data for AI companies.
( 10
min )
Public health organizations have a wealth of data about different types of diseases, health trends, and risk factors. Their staff has long used statistical models and regression analyses to make important decisions such as targeting populations with the highest risk factors for a disease with therapeutics, or forecasting the progression of concerning outbreaks. When public […]
( 8
min )
Generative AI technology is improving rapidly, and it’s now possible to generate text and images based on text input. Stable Diffusion is a text-to-image model that empowers you to create photorealistic applications. You can easily generate images from text using Stable Diffusion models through Amazon SageMaker JumpStart. The following are examples of input texts and […]
( 10
min )
When making financial decisions, it’s important to look at the big picture — say, one taken from a drone, satellite or AI-powered sensor. The emerging field of spatial finance harnesses AI insights from remote sensors and aerial imagery to help banks, insurers, investment firms and businesses analyze risks and opportunities, enable new services and products, Read article >
( 7
min )
A trio of top scientists is helping lead one of the most ambitious efforts in the history of computing — building a digital twin of Earth. Peter Bauer, Bjorn Stevens and Francisco “Paco” Doblas-Reyes agree that a digital twin of Earth needs to support resolutions down to a kilometer so a growing set of users Read article >
( 7
min )
It worked like magic. Computer vision algorithms running in a data center saw that a disease was about to infect a distant wheat field in India. Sixteen days later, workers in the field found the first evidence of the outbreak. It was the kind of wizardry people like Vinay Indraganti call digital transformation. He’s practiced Read article >
( 6
min )
Scientists at Matice Biosciences are using AI to study the regeneration of tissues in animals known as super-regenerators, such as salamanders and planarians. The goal of the research is to develop new treatments that will help humans heal from injuries without scarring. On the latest episode of NVIDIA’s AI Podcast, host Noah Kravtiz spoke with Read article >
( 5
min )
Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can discover and deploy publicly available and proprietary foundation models to dedicated Amazon SageMaker instances for your generative AI applications. SageMaker JumpStart allows you to deploy foundation models from a network isolated environment, and […]
( 10
min )
This blog post is co-written with Marat Adayev and Dmitrii Evstiukhin from Provectus. When machine learning (ML) models are deployed into production and employed to drive business decisions, the challenge often lies in the operation and management of multiple models. Machine Learning Operations (MLOps) provides the technical solution to this issue, assisting organizations in managing, […]
( 9
min )
Welcome to the shining world of beauty and wellness. This is where makeup artists, skincare devotees, and beauty enthusiasts come together to find the right potion to enhance their beauty. There is, however, a comical conundrum hidden amongst the sea of cosmetic products – the constant struggle to categorize them all! Let’s explore the mysteries… Read More »Cosmetic product recognition system for product categorization using AI & ML
The post Cosmetic product recognition system for product categorization using AI & ML appeared first on Data Science Central.
( 20
min )
The state-of-the-art models like GPT-4 and PaLM 2 have demonstrated the ability to perform complex tasks requiring reasoning and decision-making, pushing the boundaries of automated processes. Adding to this advancement, OpenAI’s recent API update empowers developers to define functions and parameters when prompting ‘gpt-4’ and ‘gpt-3.5’ models , making the automation of tasks more practical. … Read More »Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration
The post Automation Game-Changer: Exploring GPT Function Call with AWS S3 Integration appeared first on Data Science Central.
( 23
min )
Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI. H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is Read article >
( 6
min )
Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who accelerate 3D workflows and create virtual worlds using NVIDIA Omniverse, a development platform built on Universal Scene Description, aka OpenUSD. As augmented reality (AR) becomes more prominent and accessible across the globe, Kiryl Sidarchuk is Read article >
( 6
min )
While generative AI is a relatively new household term, drug discovery company Insilico Medicine has been using it for years to develop new therapies for debilitating diseases. The company’s early bet on deep learning is bearing fruit — a drug candidate discovered using its AI platform is now entering Phase 2 clinical trials to treat Read article >
( 6
min )
Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data. Over the past three years a team of Microsoft researchers has been developing a new kind of analog optical computer that uses photons and electrons to […]
The post Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization appeared first on Microsoft Research.
( 14
min )
Machine learning (ML) administrators play a critical role in maintaining the security and integrity of ML workloads. Their primary focus is to ensure that users operate with the utmost security, adhering to the principle of least privilege. However, accommodating the diverse needs of different user personas and creating appropriate permission policies can sometimes impede agility. […]
( 7
min )
Will coding be a collaborative experience using github copilot? – part one Gitub recently released a survey about developer experience which claimed that “AI is here and it’s being used at scale. 92% of U.S.-based developers are already using AI coding tools both in and outside of work.” This metric (92%) has garnered some attention… Read More »Will coding be a collaborative experience using GitHub copilot? – Part one
The post Will coding be a collaborative experience using GitHub copilot? – Part one appeared first on Data Science Central.
( 20
min )
Artificial Intelligence (AI) has emerged as a revolutionary technology that is transforming various industries, and one area where it is making a significant impact is localization. Localization refers to the process of adapting products, services, and content to meet the cultural, linguistic, and functional requirements of a specific target market. With the advent of AI,… Read More »Artificial Intelligence and localization: How AI is changing the landscape
The post Artificial Intelligence and localization: How AI is changing the landscape appeared first on Data Science Central.
( 21
min )
There probably isn’t a better time than now to develop an app for your business. By the end of 2023, mobile apps are expected to generate over $935 billion. Customers are hungry for apps that can provide instant access to services. Of course, simply having an app isn’t good enough. Consumers will only use your… Read More »Application analytics: How to leverage analytics during app creation
The post Application analytics: How to leverage analytics during app creation appeared first on Data Science Central.
( 23
min )
Tiny deep learning has attracted increasing attention driven by the
substantial demand for deploying deep learning on numerous intelligent
Internet-of-Things devices. However, it is still challenging to unleash tiny
deep learning's full potential on both large-scale datasets and downstream
tasks due to the under-fitting issues caused by the limited model capacity of
tiny neural networks (TNNs). To this end, we propose a framework called
NetBooster to empower tiny deep learning by augmenting the architectures of
TNNs via an expansion-then-contraction strategy. Extensive experiments show
that NetBooster consistently outperforms state-of-the-art tiny deep learning
solutions.
( 2
min )
The precise tracking and prediction of polar ice layers can unveil historic
trends in snow accumulation. In recent years, airborne radar sensors, such as
the Snow Radar, have been shown to be able to measure these internal ice layers
over large areas with a fine vertical resolution. In our previous work, we
found that temporal graph convolutional networks perform reasonably well in
predicting future snow accumulation when given temporal graphs containing deep
ice layer thickness. In this work, we experiment with a graph attention
network-based model and used it to predict more annual snow accumulation data
points with fewer input data points on a larger dataset. We found that these
large changes only very slightly negatively impacted performance.
( 2
min )
Existing studies addressing gender bias of pre-trained language models,
usually build a small gender-neutral data set and conduct a second phase
pre-training on the model with such data. However, given the limited size and
concentrated focus of the gender-neutral data, catastrophic forgetting would
occur during second-phase pre-training. Forgetting information in the original
training data may damage the model's downstream performance by a large margin.
In this work, we empirically show that catastrophic forgetting occurs in such
methods by evaluating them with general NLP tasks in GLUE. Then, we propose a
new method, GEnder Equality Prompt (GEEP), to improve gender fairness of
pre-trained models with less forgetting. GEEP freezes the pre-trained model and
learns gender-related prompts with gender-neutral data. Empirical results show
that GEEP not only achieves SOTA performances on gender fairness tasks, but
also forgets less and performs better on GLUE by a large margin.
( 2
min )
The concepts of overfitting and generalization are vital for evaluating
machine learning models. In this work, we show that the popular Recall@K metric
depends on the number of classes in the dataset, which limits its ability to
estimate generalization. To fix this issue, we propose a new metric, which
measures retrieval performance, and, unlike Recall@K, estimates generalization.
We apply the proposed metric to popular image retrieval methods and provide new
insights about deep metric learning generalization.
( 2
min )
We present DiffInfinite, a hierarchical diffusion model that generates
arbitrarily large histological images while preserving long-range correlation
structural information. Our approach first generates synthetic segmentation
masks, subsequently used as conditions for the high-fidelity generative
diffusion process. The proposed sampling method can be scaled up to any desired
image size while only requiring small patches for fast training. Moreover, it
can be parallelized more efficiently than previous large-content generation
methods while avoiding tiling artefacts. The training leverages classifier-free
guidance to augment a small, sparsely annotated dataset with unlabelled data.
Our method alleviates unique challenges in histopathological imaging practice:
large-scale information, costly manual annotation, and protective data
handling. The biological plausibility of DiffInfinite data is validated in a
survey by ten experienced pathologists as well as a downstream segmentation
task. Furthermore, the model scores strongly on anti-copying metrics which is
beneficial for the protection of patient data.
( 2
min )
Deep learning approaches for jet tagging in high-energy physics are
characterized as black boxes that process a large amount of information from
which it is difficult to extract key distinctive observables. In this
proceeding, we present an alternative to deep learning approaches, Boost
Invariant Polynomials, which enables direct analysis of simple analytic
expressions representing the most important features in a given task. Further,
we show how this approach provides an extremely low dimensional classifier with
a minimum set of features representing %effective discriminating physically
relevant observables and how it consequently speeds up the algorithm execution,
with relatively close performance to the algorithm using the full information.
( 2
min )
We consider an aggregated human-AI collaboration aimed at generating a joint
interpretable model. The model takes the form of Boolean decision rules, where
human input is provided in the form of logical conditions or as partial
templates. This focus on the combined construction of a model offers a
different perspective on joint decision making. Previous efforts have typically
focused on aggregating outcomes rather than decisions logic. We demonstrate the
proposed approach through two examples and highlight the usefulness and
challenges of the approach.
( 2
min )
A recent alternative for hydrogen transportation as a mixture with natural
gas is blending it into natural gas pipelines. However, hydrogen embrittlement
of material is a major concern for scientists and gas installation designers to
avoid process failures. In this paper, we propose a physics-informed machine
learning model to predict the gas pressure on the pipes' inner wall. Despite
its high-fidelity results, the current PDE-based simulators are time- and
computationally-demanding. Using simulation data, we train an ML model to
predict the pressure on the pipelines' inner walls, which is a first step for
pipeline system surveillance. We found that the physics-based method
outperformed the purely data-driven method and satisfy the physical constraints
of the gas flow system.
( 2
min )
The accurate prediction and estimation of annual snow accumulation has grown
in importance as we deal with the effects of climate change and the increase of
global atmospheric temperatures. Airborne radar sensors, such as the Snow
Radar, are able to measure accumulation rate patterns at a large-scale and
monitor the effects of ongoing climate change on Greenland's precipitation and
run-off. The Snow Radar's use of an ultra-wide bandwidth enables a fine
vertical resolution that helps in capturing internal ice layers. Given the
amount of snow accumulation in previous years using the radar data, in this
paper, we propose a machine learning model based on recurrent graph
convolutional networks to predict the snow accumulation in recent consecutive
years at a certain location. We found that the model performs better and with
more consistency than equivalent nongeometric and nontemporal models.
( 2
min )
End-to-end design of communication systems using deep autoencoders (AEs) is
gaining attention due to its flexibility and excellent performance. Besides
single-user transmission, AE-based design is recently explored in multi-user
setup, e.g., for designing constellations for non-orthogonal multiple access
(NOMA). In this paper, we further advance the design of AE-based downlink NOMA
by introducing weighted loss function in the AE training. By changing the
weight coefficients, one can flexibly tune the constellation design to balance
error probability of different users, without relying on explicit information
about their channel quality. Combined with the SICNet decoder, we demonstrate a
significant improvement in achievable levels and flexible control of error
probability of different users using the proposed weighted AE-based framework.
( 2
min )
This paper focuses on studying the impact of climate data and vector larval
indices on dengue outbreak. After a comparative study of the various LSTM
models, Bidirectional Stacked LSTM network is selected to analyze the time
series climate data and health data collected for the state of Tamil Nadu
(India), for the period 2014 to 2020. Prediction accuracy of the model is
significantly improved by including the mosquito larval index, an indication of
VBD control measure.
( 2
min )
To achieve virtual certification for industrial design, quantifying the
uncertainties in simulation-driven processes is crucial. We discuss a
physics-constrained approach to account for epistemic uncertainty of turbulence
models. In order to eliminate user input, we incorporate a data-driven machine
learning strategy. In addition to it, our study focuses on developing an a
priori estimation of prediction confidence when accurate data is scarce.
( 2
min )
Recent advancements in federated learning (FL) seek to increase client-level
performance by fine-tuning client parameters on local data or personalizing
architectures for the local task. Existing methods for such personalization
either prune a global model or fine-tune a global model on a local client
distribution. However, these existing methods either personalize at the expense
of retaining important global knowledge, or predetermine network layers for
fine-tuning, resulting in suboptimal storage of global knowledge within client
models. Enlightened by the lottery ticket hypothesis, we first introduce a
hypothesis for finding optimal client subnetworks to locally fine-tune while
leaving the rest of the parameters frozen. We then propose a novel FL
framework, FedSelect, using this procedure that directly personalizes both
client subnetwork structure and parameters, via the simultaneous discovery of
optimal parameters for personalization and the rest of parameters for global
aggregation during training. We show that this method achieves promising
results on CIFAR-10.
( 2
min )
Motivated by the novel paradigm developed by Van Roy and coauthors for
reinforcement learning in arbitrary non-Markovian environments, we propose a
related formulation and explicitly pin down the error caused by
non-Markovianity of observations when the Q-learning algorithm is applied on
this formulation. Based on this observation, we propose that the criterion for
agent design should be to seek good approximations for certain conditional
laws. Inspired by classical stochastic control, we show that our problem
reduces to that of recursive computation of approximate sufficient statistics.
This leads to an autoencoder-based scheme for agent design which is then
numerically tested on partially observed reinforcement learning environments.
( 2
min )
Predicting the presence of major depressive disorder (MDD) using behavioural
and cognitive signals is a highly non-trivial task. The heterogeneous clinical
profile of MDD means that any given speech, facial expression and/or observed
cognitive pattern may be associated with a unique combination of depressive
symptoms. Conventional discriminative machine learning models potentially lack
the complexity to robustly model this heterogeneity. Bayesian networks,
however, may instead be well-suited to such a scenario. These networks are
probabilistic graphical models that efficiently describe the joint probability
distribution over a set of random variables by explicitly capturing their
conditional dependencies. This framework provides further advantages over
standard discriminative modelling by offering the possibility to incorporate
expert opinion in the graphical structure of the models, generating explainable
model predictions, informing about the uncertainty of predictions, and
naturally handling missing data. In this study, we apply a Bayesian framework
to capture the relationships between depression, depression symptoms, and
features derived from speech, facial expression and cognitive game data
collected at thymia.
( 2
min )
In this brief note, we formulate Principal Component Analysis (PCA) over
datasets consisting not of points but of distributions, characterized by their
location and covariance. Just like the usual PCA on points can be equivalently
derived via a variance-maximization principle and via a minimization of
reconstruction error, we derive a closed-form solution for distributional PCA
from both of these perspectives.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
We establish generic uniform convergence guarantees for Gaussian data in
terms of the Rademacher complexity of the hypothesis class and the Lipschitz
constant of the square root of the scalar loss function. We show how these
guarantees substantially generalize previous results based on smoothness
(Lipschitz constant of the derivative), and allow us to handle the broader
class of square-root-Lipschitz losses, which includes also non-smooth loss
functions appropriate for studying phase retrieval and ReLU regression, as well
as rederive and better understand "optimistic rate" and interpolation learning
guarantees.
( 2
min )
In many numerical simulations stochastic gradient descent (SGD) type
optimization methods perform very effectively in the training of deep neural
networks (DNNs) but till this day it remains an open problem of research to
provide a mathematical convergence analysis which rigorously explains the
success of SGD type optimization methods in the training of DNNs. In this work
we study SGD type optimization methods in the training of fully-connected
feedforward DNNs with rectified linear unit (ReLU) activation. We first
establish general regularity properties for the risk functions and their
generalized gradient functions appearing in the training of such DNNs and,
thereafter, we investigate the plain vanilla SGD optimization method in the
training of such DNNs under the assumption that the target function under
consideration is a constant function. Specifically, we prove under the
assumption that the learning rates (the step sizes of the SGD optimization
method) are sufficiently small but not $L^1$-summable and under the assumption
that the target function is a constant function that the expectation of the
riskof the considered SGD process converges in the training of such DNNs to
zero as the number of SGD steps increases to infinity.
( 3
min )
We consider a new framework where a continuous, though bounded, random
variable has unobserved bounds that vary over time. In the context of
univariate time series, we look at the bounds as parameters of the distribution
of the bounded random variable. We introduce an extended log-likelihood
estimation and design algorithms to track the bound through online maximum
likelihood estimation. Since the resulting optimization problem is not convex,
we make use of recent theoretical results on Normalized Gradient Descent (NGD)
for quasiconvex optimization, to eventually derive an Online Normalized
Gradient Descent algorithm. We illustrate and discuss the workings of our
approach based on both simulation studies and a real-world wind power
forecasting problem.
( 2
min )
Most iterative neural network training methods use estimates of the loss
function over small random subsets (or minibatches) of the data to update the
parameters, which aid in decoupling the training time from the (often very
large) size of the training datasets. Here, we show that a minibatch approach
can also be used to train neural network ensembles (NNEs) via trajectory
methods in a highly efficent manner. We illustrate this approach by training
NNEs to classify images in the MNIST datasets. This method gives an improvement
to the training times, allowing it to scale as the ratio of the size of the
dataset to that of the average minibatch size which, in the case of MNIST,
gives a computational improvement typically of two orders of magnitude. We
highlight the advantage of using longer trajectories to represent NNEs, both
for improved accuracy in inference and reduced update cost in terms of the
samples needed in minibatch updates.
( 2
min )
While research in the field of transformer models has primarily focused on
enhancing performance metrics such as accuracy and perplexity, practical
applications in industry often necessitate a rigorous consideration of
inference latency constraints. Addressing this challenge, we introduce
SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes
accuracy whilst adhering to an upper-bound latency constraint. Our method
incorporates 8-bit integer quantization in the search process to outperform the
current state-of-the-art technique. Our results underline the feasibility and
efficacy of seeking an optimal balance between performance and latency,
providing new avenues for deploying state-of-the-art transformer models in
latency-sensitive environments.
( 2
min )
The recent advent of play-to-earn (P2E) systems in massively multiplayer
online role-playing games (MMORPGs) has made in-game goods interchangeable with
real-world values more than ever before. The goods in the P2E MMORPGs can be
directly exchanged with cryptocurrencies such as Bitcoin, Ethereum, or Klaytn
via blockchain networks. Unlike traditional in-game goods, once they had been
written to the blockchains, P2E goods cannot be restored by the game operation
teams even with chargeback fraud such as payment fraud, cancellation, or
refund. To tackle the problem, we propose a novel chargeback fraud prediction
method, PU GNN, which leverages graph attention networks with PU loss to
capture both the players' in-game behavior with P2E token transaction patterns.
With the adoption of modified GraphSMOTE, the proposed model handles the
imbalanced distribution of labels in chargeback fraud datasets. The conducted
experiments on three real-world P2E MMORPG datasets demonstrate that PU GNN
achieves superior performances over previously suggested methods.
( 3
min )
Task and Motion Planning (TAMP) approaches are effective at planning
long-horizon autonomous robot manipulation. However, because they require a
planning model, it can be difficult to apply them to domains where the
environment and its dynamics are not fully known. We propose to overcome these
limitations by leveraging deep generative modeling, specifically diffusion
models, to learn constraints and samplers that capture these
difficult-to-engineer aspects of the planning model. These learned samplers are
composed and combined within a TAMP solver in order to find action parameter
values jointly that satisfy the constraints along a plan. To tractably make
predictions for unseen objects in the environment, we define these samplers on
low-dimensional learned latent embeddings of changing object state. We evaluate
our approach in an articulated object manipulation domain and show how the
combination of classical TAMP, generative learning, and latent embeddings
enables long-horizon constraint-based reasoning.
( 2
min )
We study the cost of overfitting in noisy kernel ridge regression (KRR),
which we define as the ratio between the test error of the interpolating
ridgeless model and the test error of the optimally-tuned model. We take an
"agnostic" view in the following sense: we consider the cost as a function of
sample size for any target function, even if the sample size is not large
enough for consistency or the target is outside the RKHS. We analyze the cost
of overfitting under a Gaussian universality ansatz using recently derived
(non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis
provides a more refined characterization of benign, tempered and catastrophic
overfitting (qv Mallinar et al. 2022).
( 2
min )
Time series motifs are used for discovering higher-order structures of time
series data. Based on time series motifs, the motif embedding correlation field
(MECF) is proposed to characterize higher-order temporal structures of
dynamical system time series. A MECF-based unsupervised learning approach is
applied in locating the source of the forced oscillation (FO), a periodic
disturbance that detrimentally impacts power grids. Locating the FO source is
imperative for system stability. Compared with the Fourier analysis, the
MECF-based unsupervised learning is applicable under various FO situations,
including the single FO, FO with resonance, and multiple sources FOs. The
MECF-based unsupervised learning is a data-driven approach without any prior
knowledge requirement of system models or typologies. Tests on the UK
high-voltage transmission grid illustrate the effectiveness of MECF-based
unsupervised learning. In addition, the impacts of coupling strength and
measurement noise on locating the FO source by the MECF-based unsupervised
learning are investigated.
( 2
min )
In federated learning, data heterogeneity is a critical challenge. A
straightforward solution is to shuffle the clients' data to homogenize the
distribution. However, this may violate data access rights, and how and when
shuffling can accelerate the convergence of a federated optimization algorithm
is not theoretically well understood. In this paper, we establish a precise and
quantifiable correspondence between data heterogeneity and parameters in the
convergence rate when a fraction of data is shuffled across clients. We prove
that shuffling can quadratically reduce the gradient dissimilarity with respect
to the shuffling percentage, accelerating convergence. Inspired by the theory,
we propose a practical approach that addresses the data access rights issue by
shuffling locally generated synthetic data. The experimental results show that
shuffling synthetic data improves the performance of multiple existing
federated learning algorithms by a large margin.
( 2
min )
Training normalizing flow generative models can be challenging due to the
need to calculate computationally expensive determinants of Jacobians. This
paper studies the likelihood-free training of flows and proposes the energy
objective, an alternative sample-based loss based on proper scoring rules. The
energy objective is determinant-free and supports flexible model architectures
that are not easily compatible with maximum likelihood training, including
semi-autoregressive energy flows, a novel model family that interpolates
between fully autoregressive and non-autoregressive models. Energy flows
feature competitive sample quality, posterior inference, and generation speed
relative to likelihood-based flows; this performance is decorrelated from the
quality of log-likelihood estimates, which are generally very poor. Our
findings question the use of maximum likelihood as an objective or a metric,
and contribute to a scientific study of its role in generative modeling.
( 2
min )
We study the asymptotic generalization of an overparameterized linear model
for multiclass classification under the Gaussian covariates bi-level model
introduced in Subramanian et al.~'22, where the number of data points,
features, and classes all grow together. We fully resolve the conjecture posed
in Subramanian et al.~'22, matching the predicted regimes for generalization.
Furthermore, our new lower bounds are akin to an information-theoretic strong
converse: they establish that the misclassification rate goes to 0 or 1
asymptotically. One surprising consequence of our tight results is that the
min-norm interpolating classifier can be asymptotically suboptimal relative to
noninterpolating classifiers in the regime where the min-norm interpolating
regressor is known to be optimal.
The key to our tight analysis is a new variant of the Hanson-Wright
inequality which is broadly useful for multiclass problems with sparse labels.
As an application, we show that the same type of analysis can be used to
analyze the related multilabel classification problem under the same bi-level
ensemble.
( 2
min )
Graph generative model evaluation necessitates understanding differences
between graphs on the distributional level. This entails being able to harness
salient attributes of graphs in an efficient manner. Curvature constitutes one
such property of graphs, and has recently started to prove useful in
characterising graphs. Its expressive properties, stability, and practical
utility in model evaluation remain largely unexplored, however. We combine
graph curvature descriptors with emerging methods from topological data
analysis to obtain robust, expressive descriptors for evaluating graph
generative models.
( 2
min )
Gaussianization is a simple generative model that can be trained without
backpropagation. It has shown compelling performance on low dimensional data.
As the dimension increases, however, it has been observed that the convergence
speed slows down. We show analytically that the number of required layers
scales linearly with the dimension for Gaussian input. We argue that this is
because the model is unable to capture dependencies between dimensions.
Empirically, we find the same linear increase in cost for arbitrary input
$p(x)$, but observe favorable scaling for some distributions. We explore
potential speed-ups and formulate challenges for further research.
( 2
min )
Training normalizing flow generative models can be challenging due to the
need to calculate computationally expensive determinants of Jacobians. This
paper studies the likelihood-free training of flows and proposes the energy
objective, an alternative sample-based loss based on proper scoring rules. The
energy objective is determinant-free and supports flexible model architectures
that are not easily compatible with maximum likelihood training, including
semi-autoregressive energy flows, a novel model family that interpolates
between fully autoregressive and non-autoregressive models. Energy flows
feature competitive sample quality, posterior inference, and generation speed
relative to likelihood-based flows; this performance is decorrelated from the
quality of log-likelihood estimates, which are generally very poor. Our
findings question the use of maximum likelihood as an objective or a metric,
and contribute to a scientific study of its role in generative modeling.
( 2
min )
Transformer architectures are complex and their use in NLP, while it has
engendered many successes, makes their interpretability or explainability
challenging. Recent debates have shown that attention maps and attribution
methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this
paper, we present some of their limitations and introduce COCKATIEL, which
successfully addresses some of them. COCKATIEL is a novel, post-hoc,
concept-based, model-agnostic XAI technique that generates meaningful
explanations from the last layer of a neural net model trained on an NLP
classification task by using Non-Negative Matrix Factorization (NMF) to
discover the concepts the model leverages to make predictions and by exploiting
a Sensitivity Analysis to estimate accurately the importance of each of these
concepts for the model. It does so without compromising the accuracy of the
underlying model or requiring a new one to be trained. We conduct experiments
in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's
superior ability to discover concepts that align with humans' on Transformer
models without any supervision, we objectively verify the faithfulness of its
explanations through fidelity metrics, and we showcase its ability to provide
meaningful explanations in two different datasets.
( 3
min )
Predictive pattern mining is an approach used to construct prediction models
when the input is represented by structured data, such as sets, graphs, and
sequences. The main idea behind predictive pattern mining is to build a
prediction model by considering substructures, such as subsets, subgraphs, and
subsequences (referred to as patterns), present in the structured data as
features of the model. The primary challenge in predictive pattern mining lies
in the exponential growth of the number of patterns with the complexity of the
structured data. In this study, we propose the Safe Pattern Pruning (SPP)
method to address the explosion of pattern numbers in predictive pattern
mining. We also discuss how it can be effectively employed throughout the
entire model building process in practical data analysis. To demonstrate the
effectiveness of the proposed method, we conduct numerical experiments on
regression and classification problems involving sets, graphs, and sequences.
( 2
min )
We consider a new framework where a continuous, though bounded, random
variable has unobserved bounds that vary over time. In the context of
univariate time series, we look at the bounds as parameters of the distribution
of the bounded random variable. We introduce an extended log-likelihood
estimation and design algorithms to track the bound through online maximum
likelihood estimation. Since the resulting optimization problem is not convex,
we make use of recent theoretical results on Normalized Gradient Descent (NGD)
for quasiconvex optimization, to eventually derive an Online Normalized
Gradient Descent algorithm. We illustrate and discuss the workings of our
approach based on both simulation studies and a real-world wind power
forecasting problem.
( 2
min )
We study the cost of overfitting in noisy kernel ridge regression (KRR),
which we define as the ratio between the test error of the interpolating
ridgeless model and the test error of the optimally-tuned model. We take an
"agnostic" view in the following sense: we consider the cost as a function of
sample size for any target function, even if the sample size is not large
enough for consistency or the target is outside the RKHS. We analyze the cost
of overfitting under a Gaussian universality ansatz using recently derived
(non-rigorous) risk estimates in terms of the task eigenstructure. Our analysis
provides a more refined characterization of benign, tempered and catastrophic
overfitting (qv Mallinar et al. 2022).
( 2
min )
Amazon SageMaker Data Wrangler is a single visual interface that reduces the time required to prepare data and perform feature engineering from weeks to minutes with the ability to select and clean data, create features, and automate data preparation in machine learning (ML) workflows without writing any code. SageMaker Data Wrangler supports Snowflake, a popular […]
( 12
min )
For data scientists, moving machine learning (ML) models from proof of concept to production often presents a significant challenge. One of the main challenges can be deploying a well-performing, locally trained model to the cloud for inference and use in other applications. It can be cumbersome to manage the process, but with the right tool, […]
( 10
min )
Researchers at Yamagata University in Japan have harnessed AI to uncover four previously unseen geoglyphs — images on the ground, some as wide as 1,200 feet, made using the land’s elements — in Nazca, a seven-hour drive south of Lima, Peru. The geoglyphs — a humanoid, a pair of legs, a fish and a bird Read article >
( 4
min )
We compute how small input perturbations affect the output of deep neural
networks, exploring an analogy between deep networks and dynamical systems,
where the growth or decay of local perturbations is characterised by
finite-time Lyapunov exponents. We show that the maximal exponent forms
geometrical structures in input space, akin to coherent structures in dynamical
systems. Ridges of large positive exponents divide input space into different
regions that the network associates with different classes. These ridges
visualise the geometry that deep networks construct in input space, shedding
light on the fundamental mechanisms underlying their learning capabilities.
( 2
min )
Spaces with locally varying scale of measurement, like multidimensional
structures with differently scaled dimensions, are pretty common in statistics
and machine learning. Nevertheless, it is still understood as an open question
how to exploit the entire information encoded in them properly. We address this
problem by considering an order based on (sets of) expectations of random
variables mapping into such non-standard spaces. This order contains stochastic
dominance and expectation order as extreme cases when no, or respectively
perfect, cardinal structure is given. We derive a (regularized) statistical
test for our proposed generalized stochastic dominance (GSD) order,
operationalize it by linear optimization, and robustify it by imprecise
probability models. Our findings are illustrated with data from
multidimensional poverty measurement, finance, and medicine.
( 2
min )
We focus on decentralized stochastic non-convex optimization, where $n$
agents work together to optimize a composite objective function which is a sum
of a smooth term and a non-smooth convex term. To solve this problem, we
propose two single-time scale algorithms: Prox-DASA and Prox-DASA-GT. These
algorithms can find $\epsilon$-stationary points in
$\mathcal{O}(n^{-1}\epsilon^{-2})$ iterations using constant batch sizes (i.e.,
$\mathcal{O}(1)$). Unlike prior work, our algorithms achieve comparable
complexity without requiring large batch sizes, more complex per-iteration
operations (such as double loops), or stronger assumptions. Our theoretical
findings are supported by extensive numerical experiments, which demonstrate
the superiority of our algorithms over previous approaches. Our code is
available at https://github.com/xuxingc/ProxDASA.
( 2
min )
In recent years, studies such as
\cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have
demonstrated that incorporating additional real or generated data with
pseudo-labels can enhance adversarial training through a two-stage training
approach. In this paper, we perform a theoretical analysis of the asymptotic
behavior of this method in high-dimensional linear regression. While a
double-descent phenomenon can be observed in ridgeless training, with an
appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training
achieves a better performance. Finally, we derive a shortcut cross-validation
formula specifically tailored for the two-stage training method.
( 2
min )
We show that any randomized first-order algorithm which minimizes a
$d$-dimensional, $1$-Lipschitz convex function over the unit ball must either
use $\Omega(d^{2-\delta})$ bits of memory or make $\Omega(d^{1+\delta/6-o(1)})$
queries, for any constant $\delta\in (0,1)$ and when the precision $\epsilon$
is quasipolynomially small in $d$. Our result implies that cutting plane
methods, which use $\tilde{O}(d^2)$ bits of memory and $\tilde{O}(d)$ queries,
are Pareto-optimal among randomized first-order algorithms, and quadratic
memory is required to achieve optimal query complexity for convex optimization.
( 2
min )
Markov chain Monte Carlo (MCMC) algorithms have played a significant role in
statistics, physics, machine learning and others, and they are the only known
general and efficient approach for some high-dimensional problems. The random
walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a
great influence on the development and practice of science and engineering. The
behavior of the RWM algorithm in high-dimensional problems is typically
investigated through a weak convergence result of diffusion processes. In this
paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM
algorithm on large graphs, whose target distribution is the Gibbs measure that
includes any probability measure satisfying a Markov property. The abstract and
powerful theory of Dirichlet forms allows us to work directly and naturally on
the infinite-dimensional space, and our notion of Mosco convergence allows
Dirichlet forms associated with the RWM chains to lie on changing Hilbert
spaces. Through the optimal scaling problem, we demonstrate the impressive
strengths of the Dirichlet form approach over the standard diffusion approach.
( 2
min )
The library scikit-fda is a Python package for Functional Data Analysis
(FDA). It provides a comprehensive set of tools for representation,
preprocessing, and exploratory analysis of functional data. The library is
built upon and integrated in Python's scientific ecosystem. In particular, it
conforms to the scikit-learn application programming interface so as to take
advantage of the functionality for machine learning provided by this package:
pipelines, model selection, and hyperparameter tuning, among others. The
scikit-fda package has been released as free and open-source software under a
3-Clause BSD license and is open to contributions from the FDA community. The
library's extensive documentation includes step-by-step tutorials and detailed
examples of use.
( 2
min )
Six teams conducting research in AI, data science, and machine learning receive funding for projects that have potential commercial applications.
( 9
min )
Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like DALL·E, Microsoft Designer, and Bing Image Creator can generate art, architecture, videos, and other digital […]
The post DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication appeared first on Microsoft Research.
( 16
min )
Researcher Bichlien Nguyen is an organic electrochemist turned technologist. Professor David Kwabi is a mechanical engineer. Their work uses ML to help discover organic compounds for renewable energy storage. Learn about their collaboration.
The post Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi appeared first on Microsoft Research.
( 33
min )
This post is co-written with Aruna Abeyakoon and Denisse Colin from Light and Wonder (L&W). Headquartered in Las Vegas, Light & Wonder, Inc. is the leading cross-platform global game company that provides gambling products and services. Working with AWS, Light & Wonder recently developed an industry-first secure solution, Light & Wonder Connect (LnW Connect), to […]
( 12
min )
Detecting delirium isn’t easy, but it can have a big payoff: speeding essential care to patients, leading to quicker and surer recovery. Improved detection also reduces the need for long-term skilled care, enhancing the quality of life for patients while decreasing a major financial burden. In the U.S., caring for those suffering from delirium costs Read article >
( 5
min )
Conquer the lands in Microsoft’s award-winning Age of Empires III: Definitive Edition. It leads 10 new games supported today on GeForce NOW. At Your Command Age of Empires III: Definitive Edition is a remaster of one of the most beloved real-time strategy franchises featuring improved visuals, enhanced gameplay, cross-platform multiplayer and more. Command mighty civilizations Read article >
( 4
min )
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). Studio provides a single web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. Lifecycle configurations are shell scripts triggered by Studio lifecycle events, such as starting […]
( 7
min )
As a contact center agent, would you rather focus on having productive customer conversations or get distracted by having to look up customer information and knowledge articles that could exist in various systems? We’ve all been there. Having a productive conversation while multitasking is challenging. A single negative experience may put a dent on a […]
( 7
min )
Amir Anbarestani, an accomplished 3D artist who goes by the moniker Kingsletter, had a “shell of a good time” creating his Space Turtle scene this week In the NVIDIA Studio.
( 7
min )
Whether animating fish fins or fashioning chic outfits for digital characters, creators can tap Marvelous Designer software to compose and tailor assets, clothes and other materials for their 3D workflows.
( 5
min )
Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is becoming an additional objective for customers. This is important because training ML models and then using the trained models to make predictions (inference) can be highly energy-intensive tasks. In addition, more […]
( 8
min )
Generative AI will “supercharge” creators across industries and content types, NVIDIA founder and CEO Jensen Huang said today at the Cannes Lions Festival, on the French Riviera. “For the very first time, the creative process can be amplified in content generation, and the content generation could be in any modality — it could be text, Read article >
( 7
min )
Announcements Is optimized structural efficiency ‘human’ design? I recently read a paper titled “On the use of Artificial Neural Networks in Topology Optimisation,” about the process of topological optimization. In short, topological optimization is the process of determining the most efficient distribution of structural material for a given design. Typically there are simulation models involved… Read More »DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design?
The post DSC Weekly 20 June 2023 – Is optimized structural efficiency ‘human’ design? appeared first on Data Science Central.
( 20
min )
In the vast realm of artificial intelligence, few fields have captivated our imagination and pushed the boundaries of possibility quite like computer vision. At the core of this domain of research and innovation lies the ambition to empower technologies for real-world vision-based systems, enabling machines to take in and respond to visual stimuli with unparalleled […]
The post Microsoft at CVPR 2023: Pushing the boundaries of computer vision appeared first on Microsoft Research.
( 16
min )
Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. For provisioning Studio in your AWS account and Region, you first need to create an Amazon SageMaker domain—a construct that encapsulates your ML environment. More concretely, a SageMaker domain […]
( 9
min )
Neural architecture search (NAS) for Graph neural networks (GNNs), called
NAS-GNNs, has achieved significant performance over manually designed GNN
architectures. However, these methods inherit issues from the conventional NAS
methods, such as high computational cost and optimization difficulty. More
importantly, previous NAS methods have ignored the uniqueness of GNNs, where
GNNs possess expressive power without training. With the randomly-initialized
weights, we can then seek the optimal architecture parameters via the sparse
coding objective and derive a novel NAS-GNNs method, namely neural architecture
coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can
efficiently compute in linear time. Empirical evaluations on multiple GNN
benchmark datasets demonstrate that our approach leads to state-of-the-art
performance, which is up to $200\times$ faster and $18.8\%$ more accurate than
the strong baselines.
( 2
min )
We consider (stochastic) subgradient methods for strongly convex but
potentially nonsmooth non-Lipschitz optimization. We provide new equivalent
dual descriptions (in the style of dual averaging) for the classic subgradient
method, the proximal subgradient method, and the switching subgradient method.
These equivalences enable $O(1/T)$ convergence guarantees in terms of both
their classic primal gap and a not previously analyzed dual gap for strongly
convex optimization. Consequently, our theory provides these classic methods
with simple, optimal stopping criteria and optimality certificates at no added
computational cost. Our results apply under nearly any stepsize selection and
for a range of non-Lipschitz ill-conditioned problems where the early
iterations of the subgradient method may diverge exponentially quickly (a
phenomenon which, to the best of our knowledge, no prior works address). Even
in the presence of such undesirable behaviors, our theory still ensures and
bounds eventual convergence.
( 2
min )
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing
neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being
popular choices for cycling through random or single permutations of the
training data. However, the convergence properties of these algorithms in the
non-convex case are not fully understood. Existing results suggest that, in
realistic training scenarios where the number of epochs is smaller than the
training set size, RR may perform worse than SGD.
In this paper, we analyze a general SGD algorithm that allows for arbitrary
data orderings and show improved convergence rates for non-convex functions.
Specifically, our analysis reveals that SGD with random and single shuffling is
always faster or at least as good as classical SGD with replacement, regardless
of the number of iterations. Overall, our study highlights the benefits of
using SGD with random/single shuffling and provides new insights into its
convergence properties for non-convex optimization.
( 2
min )
While several recent works have identified societal-scale and
extinction-level risks to humanity arising from artificial intelligence, few
have attempted an {\em exhaustive taxonomy} of such risks. Many exhaustive
taxonomies are possible, and some are useful -- particularly if they reveal new
risks or practical approaches to safety. This paper explores a taxonomy based
on accountability: whose actions lead to the risk, are the actors unified, and
are they deliberate? We also provide stories to illustrate how the various risk
types could each play out, including risks arising from unanticipated
interactions of many AI systems, as well as risks from deliberate misuse, for
which combined technical and policy solutions are indicated.
( 2
min )
Spectral-temporal graph neural network is a promising abstraction underlying
most time series forecasting models that are based on graph neural networks
(GNNs). However, more is needed to know about the underpinnings of this branch
of methods. In this paper, we establish a theoretical framework that unravels
the expressive power of spectral-temporal GNNs. Our results show that linear
spectral-temporal GNNs are universal under mild assumptions, and their
expressive power is bounded by our extended first-order Weisfeiler-Leman
algorithm on discrete-time dynamic graphs. To make our findings useful in
practice on valid instantiations, we discuss related constraints in detail and
outline a theoretical blueprint for designing spatial and temporal modules in
spectral domains. Building on these insights and to demonstrate how powerful
spectral-temporal GNNs are based on our framework, we propose a simple
instantiation named Temporal Graph GegenConv (TGC), which significantly
outperforms most existing models with only linear components and shows better
model efficiency.
( 2
min )
This paper presents a local energy distribution based hyperparameter
determination for stochastic simulated annealing (SSA). SSA is capable of
solving combinatorial optimization problems faster than typical simulated
annealing (SA), but requires a time-consuming hyperparameter search. The
proposed method determines hyperparameters based on the local energy
distributions of spins (probabilistic bits). The spin is a basic computing
element of SSA and is graphically connected to other spins with its weights.
The distribution of the local energy can be estimated based on the central
limit theorem (CLT). The CLT-based normal distribution is used to determine the
hyperparameters, which reduces the time complexity for hyperparameter search
from O(n^3) of the conventional method to O(1). The performance of SSA with the
determined hyperparameters is evaluated on the Gset and K2000 benchmarks for
maximum-cut problems. The results show that the proposed method achieves mean
cut values of approximately 98% of the best-known cut values.
( 2
min )
Privacy-utility tradeoff remains as one of the fundamental issues of
differentially private machine learning. This paper introduces a geometrically
inspired kernel-based approach to mitigate the accuracy-loss issue in
classification. In this approach, a representation of the affine hull of given
data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads
to a novel distance measure that hides privacy-sensitive information about
individual data points and improves the privacy-utility tradeoff via
significantly reducing the risk of membership inference attacks. The
effectiveness of the approach is demonstrated through experiments on MNIST
dataset, Freiburg groceries dataset, and a real biomedical dataset. It is
verified that the approach remains computationally practical. The application
of the approach to federated learning is considered and it is observed that the
accuracy-loss due to data being distributed is either marginal or not
significantly high.
( 2
min )
The intersection of machine learning and dynamical systems has generated
considerable interest recently. Neural Ordinary Differential Equations (NODEs)
represent a rich overlap between these fields. In this paper, we develop a
continuous time neural network approach based on Delay Differential Equations
(DDEs). Our model uses the adjoint sensitivity method to learn the model
parameters and delay directly from data. Our approach is inspired by that of
NODEs and extends earlier neural DDE models, which have assumed that the value
of the delay is known a priori. We perform a sensitivity analysis on our
proposed approach and demonstrate its ability to learn DDE parameters from
benchmark systems. We conclude our discussion with potential future directions
and applications.
( 2
min )
Out-of-distribution (OOD) generalization deals with the prevalent learning
scenario where test distribution shifts from training distribution. With rising
application demands and inherent complexity, graph OOD problems call for
specialized solutions. While data-centric methods exhibit performance
enhancements on many generic machine learning tasks, there is a notable absence
of data augmentation methods tailored for graph OOD generalization. In this
work, we propose to achieve graph OOD generalization with the novel design of
non-Euclidean-space linear extrapolation. The proposed augmentation strategy
extrapolates both structure and feature spaces to generate OOD graph data. Our
design tailors OOD samples for specific shifts without corrupting underlying
causal mechanisms. Theoretical analysis and empirical results evidence the
effectiveness of our method in solving target shifts, showing substantial and
constant improvements across various graph OOD tasks.
( 2
min )
Due to the popularity of Graph Neural Networks (GNNs), various GNN-based
methods have been designed to reason on knowledge graphs (KGs). An important
design component of GNN-based KG reasoning methods is called the propagation
path, which contains a set of involved entities in each propagation step.
Existing methods use hand-designed propagation paths, ignoring the correlation
between the entities and the query relation. In addition, the number of
involved entities will explosively grow at larger propagation steps. In this
work, we are motivated to learn an adaptive propagation path in order to filter
out irrelevant entities while preserving promising targets. First, we design an
incremental sampling mechanism where the nearby targets and layer-wise
connections can be preserved with linear complexity. Second, we design a
learning-based sampling distribution to identify the semantically related
entities. Extensive experiments show that our method is powerful, efficient,
and semantic-aware. The code is available at
https://github.com/LARS-research/AdaProp.
( 2
min )
Score-based generative models (SGMs) learn a family of noise-conditional
score functions corresponding to the data density perturbed with increasingly
large amounts of noise. These perturbed data densities are linked together by
the Fokker-Planck equation (FPE), a partial differential equation (PDE)
governing the spatial-temporal evolution of a density undergoing a diffusion
process. In this work, we derive a corresponding equation called the score FPE
that characterizes the noise-conditional scores of the perturbed data densities
(i.e., their gradients). Surprisingly, despite the impressive empirical
performance, we observe that scores learned through denoising score matching
(DSM) fail to fulfill the underlying score FPE, which is an inherent
self-consistency property of the ground truth score. We prove that satisfying
the score FPE is desirable as it improves the likelihood and the degree of
conservativity. Hence, we propose to regularize the DSM objective to enforce
satisfaction of the score FPE, and we show the effectiveness of this approach
across various datasets.
( 2
min )
CORL is an open-source library that provides thoroughly benchmarked
single-file implementations of both deep offline and offline-to-online
reinforcement learning algorithms. It emphasizes a simple developing experience
with a straightforward codebase and a modern analysis tracking tool. In CORL,
we isolate methods implementation into separate single files, making
performance-relevant details easier to recognize. Additionally, an experiment
tracking feature is available to help log metrics, hyperparameters,
dependencies, and more to the cloud. Finally, we have ensured the reliability
of the implementations by benchmarking commonly employed D4RL datasets
providing a transparent source of results that can be reused for robust
evaluation tools such as performance profiles, probability of improvement, or
expected online performance.
( 2
min )
Novel test selectors used in simulation-based verification have been shown to
significantly accelerate coverage closure regardless of the number of coverage
holes. This paper presents a configurable and highly-automated framework for
novel test selection based on neural networks. Three configurations of this
framework are tested with a commercial signal processing unit. All three
convincingly outperform random test selection with the largest saving of
simulation being 49.37% to reach 99.5% coverage. The computational expense of
the configurations is negligible compared to the simulation reduction. We
compare the experimental results and discuss important characteristics related
to the performance of the configurations.
( 2
min )
The quasiparticle effective mass $m^\ast$ of interacting electrons is a
fundamental quantity in the Fermi liquid theory. However, the precise value of
the effective mass of uniform electron gas is still elusive after decades of
research. The newly developed neural canonical transformation approach [Xie et
al., J. Mach. Learn. 1, (2022)] offers a principled way to extract the
effective mass of electron gas by directly calculating the thermal entropy at
low temperature. The approach models a variational many-electron density matrix
using two generative neural networks: an autoregressive model for momentum
occupation and a normalizing flow for electron coordinates. Our calculation
reveals a suppression of effective mass in the two-dimensional spin-polarized
electron gas, which is more pronounced than previous reports in the low-density
strong-coupling region. This prediction calls for verification in
two-dimensional electron gas experiments.
( 2
min )
The transition to a fully renewable energy grid requires better forecasting
of demand at the low-voltage level to increase efficiency and ensure reliable
control. However, high fluctuations and increasing electrification cause huge
forecast variability, not reflected in traditional point estimates.
Probabilistic load forecasts take future uncertainties into account and thus
allow more informed decision-making for the planning and operation of
low-carbon energy systems. We propose an approach for flexible conditional
density forecasting of short-term load based on Bernstein polynomial
normalizing flows, where a neural network controls the parameters of the flow.
In an empirical study with 363 smart meter customers, our density predictions
compare favorably against Gaussian and Gaussian mixture densities. Also, they
outperform a non-parametric approach based on the pinball loss for 24h-ahead
load forecasting for two different neural network architectures.
( 2
min )
We study the loss landscape of training problems for deep artificial neural
networks with a one-dimensional real output whose activation functions contain
an affine segment and whose hidden layers have width at least two. It is shown
that such problems possess a continuum of spurious (i.e., not globally optimal)
local minima for all target functions that are not affine. In contrast to
previous works, our analysis covers all sampling and parameterization regimes,
general differentiable loss functions, arbitrary continuous nonpolynomial
activation functions, and both the finite- and infinite-dimensional setting. It
is further shown that the appearance of the spurious local minima in the
considered training problems is a direct consequence of the universal
approximation theorem and that the underlying mechanisms also cause, e.g.,
$L^p$-best approximation problems to be ill-posed in the sense of Hadamard for
all networks that do not have a dense image. The latter result also holds
without the assumption of local affine linearity and without any conditions on
the hidden layers.
( 2
min )
In the pursuit of artificial general intelligence (AGI), we tackle
Abstraction and Reasoning Corpus (ARC) tasks using a novel two-pronged
approach. We employ the Decision Transformer in an imitation learning paradigm
to model human problem-solving, and introduce an object detection algorithm,
the Push and Pull clustering method. This dual strategy enhances AI's ARC
problem-solving skills and provides insights for AGI progression. Yet, our work
reveals the need for advanced data collection tools, robust training datasets,
and refined model structures. This study highlights potential improvements for
Decision Transformers and propels future AGI research.
( 2
min )
We consider the problem of recovering a latent graph where the observations
at each node are \emph{aliased}, and transitions are stochastic. Observations
are gathered by an agent traversing the graph. Aliasing means that multiple
nodes emit the same observation, so the agent can not know in which node it is
located. The agent needs to uncover the hidden topology as accurately as
possible and in as few steps as possible. This is equivalent to efficient
recovery of the transition probabilities of a partially observable Markov
decision process (POMDP) in which the observation probabilities are known. An
algorithm for efficiently exploring (and ultimately recovering) the latent
graph is provided. Our approach is exponentially faster than naive exploration
in a variety of challenging topologies with aliased observations while
remaining competitive with existing baselines in the unaliased regime.
( 2
min )
We develop information-geometric techniques to analyze the trajectories of
the predictions of deep networks during training. By examining the underlying
high-dimensional probabilistic models, we reveal that the training process
explores an effectively low-dimensional manifold. Networks with a wide range of
architectures, sizes, trained using different optimization methods,
regularization techniques, data augmentation techniques, and weight
initializations lie on the same manifold in the prediction space. We study the
details of this manifold to find that networks with different architectures
follow distinguishable trajectories but other factors have a minimal influence;
larger networks train along a similar manifold as that of smaller networks,
just faster; and networks initialized at very different parts of the prediction
space converge to the solution along a similar manifold.
( 2
min )
We propose a new method for optimistic planning in infinite-horizon
discounted Markov decision processes based on the idea of adding regularization
to the updates of an otherwise standard approximate value iteration procedure.
This technique allows us to avoid contraction and monotonicity arguments
typically required by existing analyses of approximate dynamic programming
methods, and in particular to use approximate transition functions estimated
via least-squares procedures in MDPs with linear function approximation. We use
our method to recover known guarantees in tabular MDPs and to provide a
computationally efficient algorithm for learning near-optimal policies in
discounted linear mixture MDPs from a single stream of experience, and show it
achieves near-optimal statistical guarantees.
( 2
min )
It is shown that over-parameterized neural networks can achieve minimax
optimal rates of convergence (up to logarithmic factors) for learning functions
from certain smooth function classes, if the weights are suitably constrained
or regularized. Specifically, we consider the nonparametric regression of
estimating an unknown $d$-variate function by using shallow ReLU neural
networks. It is assumed that the regression function is from the H\"older space
with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow
neural networks, which can be viewed as an infinitely wide neural network. In
this setting, we prove that least squares estimators based on shallow neural
networks with certain norm constraints on the weights are minimax optimal, if
the network width is sufficiently large. As a byproduct, we derive a new
size-independent bound for the local Rademacher complexity of shallow ReLU
neural networks, which may be of independent interest.
( 2
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )